Understanding Bias

In Machine Learning & Epidemiology

🎯 What is Bias?

A systematic error that leads to incorrect conclusions or unfair outcomes

🤖 Machine Learning

In ML, bias refers to systematic errors in algorithms that lead to unfair or inaccurate predictions, often reflecting historical inequalities in training data.

Focus: Algorithmic fairness, prediction accuracy, data representation

🏥 Epidemiology

In epidemiology, bias refers to systematic errors in study design or analysis that lead to incorrect estimates of disease associations or treatment effects.

Focus: Study validity, causal inference, population health

🔗 Common Ground

  • Both fields deal with drawing conclusions from data
  • Both are concerned with systematic errors that affect validity
  • Both require careful attention to data collection and analysis
  • Both can have serious real-world consequences when biased

🔍 Major Types of Bias

🤖 Machine Learning Biases

  • Selection Bias
    Training data from only certain hospitals or patient populations
  • Confirmation Bias
    Choosing clinical features that confirm existing medical beliefs
  • Algorithmic Bias
    AI models systematically underperform for certain demographic groups
  • Sampling Bias
    Non-representative patient samples lead to biased diagnostic models
  • Historical Bias
    Past healthcare disparities encoded in medical AI training data

🏥 Epidemiological Biases

  • Selection Bias
    Study participants from only urban hospitals, missing rural populations
  • Information Bias
    Systematic errors in medical record collection or diagnostic measurements
  • Confounding Bias
    Socioeconomic factors affect both disease exposure and health outcomes
  • Recall Bias
    Patients with disease remember past exposures differently than healthy controls
  • Survival Bias
    Only analyzing patients who survived to hospital admission

🔄 How Selection Bias Works

Target Population
Everyone we want to study
Sample Selection
Who we actually include
Bias Introduced
Systematic differences
Invalid Results
Can't generalize

🎮 Interactive Examples

Click to see detailed scenarios of how selection bias affects different fields:

⚖️ Similarities & Key Differences

Aspect Machine Learning Epidemiology Overlap
Primary Goal Accurate medical predictions & equitable care Valid causal inference about health Truth from healthcare data
Data Source Medical images, EHR, wearable devices Clinical trials, health surveys, registries Both use healthcare observational data
Bias Impact Unfair AI, missed diagnoses, health disparities Wrong conclusions about treatments/disease Systematic errors affecting patient care
Detection Methods Performance across patient demographics Study design, statistical tests Healthcare data analysis and validation
Prevention Diverse patient data, fairness constraints Randomization, stratified sampling Careful healthcare methodology

🎯 Where Definitions Overlap

Selection Bias in Healthcare

Both fields worry about non-representative patient samples leading to invalid medical conclusions

Confounding in Health Data

ML calls it "spurious correlation," epidemiology calls it "confounding" - same concept affecting health outcomes!

🚨 Key Differences in Healthcare Context

  • ML bias in healthcare often focuses on fairness across patient groups (algorithmic discrimination in medical AI)
  • Epidemiological bias focuses on validity of causal relationships in health studies
  • ML can sometimes "fix" bias with more diverse patient data; epidemiology requires study design changes
  • ML bias detection is often post-hoc; epidemiological bias prevention is often pre-study in clinical research

🛡️ Detection & Prevention Strategies

🔍 ML Bias Detection

  • 📊 Healthcare Fairness Metrics: Equal diagnostic accuracy across patient demographics
  • 🔄 Cross-validation: Test on different patient populations and hospital settings
  • 📈 Performance Analysis: Compare diagnostic accuracy across age, gender, race, socioeconomic status
  • 🎯 Bias Testing: Test with rare diseases and edge cases in medical data

Prevention Strategies:

  • ✅ Diverse, representative patient training data
  • ✅ Bias-aware medical AI algorithms
  • ✅ Regular healthcare AI audits
  • ✅ Diverse clinical development teams

🔬 Epidemiological Bias Control

  • 🎲 Randomization: Random assignment in clinical trials to reduce confounding
  • 🎭 Blinding: Hide treatment assignment from patients/doctors in clinical trials
  • 📝 Standardized Protocols: Consistent medical data collection methods
  • 📊 Statistical Adjustment: Control for known health confounders (age, comorbidities)

Prevention Strategies:

  • ✅ Careful clinical study design (RCTs when possible)
  • ✅ Representative patient sampling strategies
  • ✅ Validated medical measurement instruments
  • ✅ Prospective healthcare data collection

🎓 Student Action Items

When Working with Healthcare ML:

  • Always examine your patient training data demographics
  • Test model performance across different patient groups
  • Question whether your data represents all patient populations
  • Consider ethical implications for patient care and health equity

When Reading Healthcare Epidemiology:

  • Look for randomized controlled clinical trials
  • Check if the study population matches your patient population
  • Identify potential health confounding variables
  • Consider alternative explanations for health findings

🎯 Universal Principles

Whether in Healthcare ML or Epidemiology: Always question your patient data, consider which populations are missing, and remember that bias often reflects healthcare disparities that need addressing, not just technical fixes.

📚 Further Reading & Resources

🤖 Healthcare ML Bias

  • • "Fairness and Machine Learning" by Barocas et al.
  • • "AI in Healthcare" by Agrawal et al.
  • • AI Fairness 360 Toolkit (IBM)
  • • Fairlearn (Microsoft)

🏥 Healthcare Epidemiological Bias

  • • "Modern Epidemiology" by Rothman & Greenland
  • • "Epidemiology: An Introduction" by Rothman
  • • STROBE Guidelines for Observational Studies
  • • CONSORT Guidelines for RCTs

🔗 Key Takeaway

Both fields share the fundamental challenge of drawing valid conclusions from imperfect healthcare data. The terminology may differ, but the principles of careful methodology, representative patient sampling, and critical thinking apply universally to ensure equitable healthcare outcomes.

🧠 Test Your Understanding

Question 1: Selection Bias in Healthcare AI

A machine learning model for diagnosing diabetic retinopathy is trained only on images from patients at urban teaching hospitals. What type of bias is this?

A Confirmation Bias
B Selection Bias
C Algorithmic Bias
D Historical Bias
Correct! This is Selection Bias because the training data doesn't represent the target population (rural patients and community hospitals are missing).

🎉 Quiz Complete!

📊 Bias Detection Flow

1. Data Collection
Check representativeness
2. Analysis
Test across groups
3. Validation
Cross-check results
4. Action
Address findings

© 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin

Interactive slides designed for enhanced learning experience