Understand how K-fold cross-validation works by visualizing the data splits and training process
K-fold cross-validation is a technique to assess how well a machine learning model will generalize to new data. It divides the training data into K equal parts (folds), then trains and validates the model K times, each time using a different fold as validation data.
This happens only once! After K-fold CV helps you select the best model configuration, you train the final model on the complete training dataset and get your unbiased performance estimate from the test set.
Step 1: The data is split into training and test sets. The test set is completely held out and never used during cross-validation.
Step 2: The training data is divided into K equal folds. In each iteration, one fold serves as validation while the remaining K-1 folds are used for training.
Step 3: This process repeats K times, with each fold taking a turn as the validation set. The final performance is the average of all K validation scores.
ONLY ONCE at the very end! After completing all K-fold cross-validation iterations and selecting the best model configuration, you train the final model on the entire training data and then evaluate it on the held-out test set. This gives you an unbiased estimate of how well your model will perform on completely unseen data.
Iteration | Validation Score |
---|
© 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin
Interactive slides designed for enhanced learning experience