Ensemble Methods — Interactive Visualization

Click Next to walk through each method. Understand how different ensemble techniques handle training, validation, and test data splits.

🎯 What are Ensemble Methods?

Think of ensemble methods like asking multiple experts for their opinion instead of just one. Instead of relying on a single machine learning model, we combine predictions from several models to get better, more reliable results. It's like having a team of doctors diagnose a patient - each might see something different, but together they're more accurate.

🤔 Why use ensembles?
  • More accurate predictions
  • More stable results
  • Better handling of uncertainty
  • Reduces overfitting risk
🎯 Key concepts:
  • Base models: Individual learning algorithms
  • Combination: How we merge their predictions
  • Diversity: Models should be different from each other

🎓 How to Use This Visualization

1. Choose a Method: Click on different tabs to explore each ensemble technique
2. Step Through: Use "Next" button to see each step in detail
3. Adjust Parameters: Change dataset size and other settings to see how numbers update
4. Read Explanations: Each step has beginner-friendly explanations below
Dataset N=? Sample 1 Sample 2 Sample 3 Sample 4 Tree 1 Tree 2 Tree 3 Tree 4 Average / Vote Random Forest
Bagging — Step 1/4
Start with the dataset. Bagging draws multiple bootstrap samples (with replacement) from the same dataset.

📚 Beginner's Guide to This Step

What's happening: We're starting with our original dataset. Think of it like having a big collection of examples to learn from.

💡 Key Concept - Bootstrap Sampling: This is like randomly picking examples from our dataset, but we can pick the same example multiple times (with replacement). It's like drawing cards from a deck and putting them back before the next draw.

Why this matters: By creating different random samples, each model will see slightly different data, making them more diverse and robust.

Method details

  • What it’s good at: Reducing variance; robust to noise.
  • Common algorithms: Random Forest, Bagged Trees.
  • Notes: Each bootstrap sample is size N; ~63.2% unique; OOB estimate available.
Data / samples Base learner Meta / combiner Errors / weights Aggregator Training data Validation data Test data

🔍 Method Comparison for Beginners

🌳 Bagging (Random Forest)
  • Parallel training
  • Reduces variance
  • Good for noisy data
  • Built-in validation
🚀 Boosting (XGBoost)
  • Sequential training
  • Reduces bias
  • Very accurate
  • Can overfit
🏗️ Stacking
  • Meta-learner
  • Most flexible
  • Complex setup
  • Best performance

© 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin

Interactive slides designed for enhanced learning experience