Nested Cross-Validation (Outer → Inner)

Outer CV gives an unbiased estimate on truly unseen data. Inner CV tunes hyperparameters only on the outer-train split.

🎯 Why Nested Cross-Validation?

Problem with Regular CV:

Hyperparameter tuning uses the same data for selection and evaluation
Leads to optimistic bias - overestimating performance
Model selection and performance estimation are not independent

Solution with Nested CV:

Outer loop: Provides unbiased performance estimation
Inner loop: Tunes hyperparameters on training data only
Each outer test fold is truly unseen during model selection

K (outer): K (inner): Samples: 3000

Choose outer test fold

Step 1 Explanation: We start by dividing the entire dataset into K outer folds. One fold becomes the test set (yellow), while the remaining K-1 folds become the training set (green). This outer test fold will be held out completely and never used for model selection or hyperparameter tuning.

Run inner CV on the outer-train portion

Step 2 Explanation: On the outer training data (green), we perform inner cross-validation to tune hyperparameters. Each inner fold uses a portion of the outer training data for validation (blue) while training on the rest (green). This ensures hyperparameter selection happens only on training data, never touching the outer test fold.

This iteration (whole dataset)

Step 3 Explanation: This shows how the current data split uses the entire dataset. The outer test fold (yellow) is completely held out. The inner validation (blue) is used only for hyperparameter tuning, and the training data (green) is used for model fitting.

Train used

—

Inner validation

—

Outer test

—

Green + Blue + Yellow = 100% of the data for this outer/inner step.

Training (Model fitting) Validation (Hyperparameter tuning) Outer test (Performance evaluation) Header (Fold labels)

💡 Key Benefits:

🎯 Unbiased Estimation: Outer test folds are never used for model selection

🔒 No Data Leakage: Hyperparameter tuning is completely independent of final evaluation

📊 Reliable Metrics: Performance estimates reflect real-world generalization

⚖️ Fair Comparison: Different models can be compared fairly

Counts Outer iter 1/5 • Inner 1/5

Outer test performance Mean —

Outer iter	Best hyperparam	Outer test score

Inner CV chooses the best hyperparameter using only the outer‑train data. We then retrain on the full outer‑train with that hyperparameter and evaluate once on the outer‑test.

🔄 Complete Process Flow:

For each outer fold (K times):

Hold out one fold as outer test set
Use remaining folds for outer training
Perform inner CV on outer training data
Select best hyperparameters
Train final model on full outer training data
Evaluate on outer test set

Final Results:

K unbiased performance estimates
Mean and standard deviation of performance
Confidence interval for model performance
No data leakage between selection and evaluation