Health Database = Matrix

📊 Health Records Database

PatientID Age BMI BloodPressure Cholesterol Diabetic HeartDisease
P001 45 28.5 140 220 0 1
P002 32 24.1 120 180 1 0
P003 67 31.2 160 240 1 1
P004 29 22.8 110 160 0 0
P005 54 29.7 150 210 0 1
➡️

🔢 Matrix Representation

[
45 28.5 140 220 0 1
32 24.1 120 180 1 0
67 31.2 160 240 1 1
29 22.8 110 160 0 0
54 29.7 150 210 0 1
]

Note: PatientID removed (not a feature)

Matrix X: 5 patients × 6 features (n × p matrix)

🏥 Each Row = Patient

  • One observation
  • Complete health profile
  • All measurements for one person
  • Sample size n = 5 patients

📏 Each Column = Feature

  • One variable/measurement
  • Same attribute across all patients
  • Predictor variables
  • Feature space p = 6 variables

🔗 Linear Regression Connection

To predict HeartDisease from other health metrics:

Design Matrix X:
[Age BMI BP Chol Diabetic]
45 28.5 140 220 0
32 24.1 120 180 1
67 31.2 160 240 1
29 22.8 110 160 0
54 29.7 150 210 0
Response Vector y:
[HeartDisease]
1
0
1
0
1
Classical Solution: β̂ = (XTX)-1XTy

Problem: If XTX is not invertible → Need ML methods!

© 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin

Interactive slides designed for enhanced learning experience