Nested Cross-Validation (Outer → Inner)

Outer CV gives an unbiased estimate on truly unseen data. Inner CV tunes hyperparameters only on the outer-train split.

🎯 Why Nested Cross-Validation?

Problem with Regular CV:
  • Hyperparameter tuning uses the same data for selection and evaluation
  • Leads to optimistic bias - overestimating performance
  • Model selection and performance estimation are not independent
Solution with Nested CV:
  • Outer loop: Provides unbiased performance estimation
  • Inner loop: Tunes hyperparameters on training data only
  • Each outer test fold is truly unseen during model selection
1
Choose outer test fold

Step 1 Explanation: We start by dividing the entire dataset into K outer folds. One fold becomes the test set (yellow), while the remaining K-1 folds become the training set (green). This outer test fold will be held out completely and never used for model selection or hyperparameter tuning.

2
Run inner CV on the outer-train portion

Step 2 Explanation: On the outer training data (green), we perform inner cross-validation to tune hyperparameters. Each inner fold uses a portion of the outer training data for validation (blue) while training on the rest (green). This ensures hyperparameter selection happens only on training data, never touching the outer test fold.

3
This iteration (whole dataset)

Step 3 Explanation: This shows how the current data split uses the entire dataset. The outer test fold (yellow) is completely held out. The inner validation (blue) is used only for hyperparameter tuning, and the training data (green) is used for model fitting.

Train used
Inner validation
Outer test
Green + Blue + Yellow = 100% of the data for this outer/inner step.
Training (Model fitting) Validation (Hyperparameter tuning) Outer test (Performance evaluation) Header (Fold labels)

💡 Key Benefits:

🎯 Unbiased Estimation: Outer test folds are never used for model selection
🔒 No Data Leakage: Hyperparameter tuning is completely independent of final evaluation
📊 Reliable Metrics: Performance estimates reflect real-world generalization
⚖️ Fair Comparison: Different models can be compared fairly
Counts Outer iter 1/5 • Inner 1/5

Outer test performance Mean —
Outer iterBest hyperparamOuter test score
Inner CV chooses the best hyperparameter using only the outer‑train data. We then retrain on the full outer‑train with that hyperparameter and evaluate once on the outer‑test.

🔄 Complete Process Flow:

For each outer fold (K times):
  1. Hold out one fold as outer test set
  2. Use remaining folds for outer training
  3. Perform inner CV on outer training data
  4. Select best hyperparameters
  5. Train final model on full outer training data
  6. Evaluate on outer test set
Final Results:
  • K unbiased performance estimates
  • Mean and standard deviation of performance
  • Confidence interval for model performance
  • No data leakage between selection and evaluation

© 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin

Interactive slides designed for enhanced learning experience