1 / 12

Neural Networks vs Classical ML

Training Paradigms, Epochs, and Batch Processing

🤖 Understanding Different Learning Approaches

🎯 Learning Objectives:

  • Understand training differences between classical ML and neural networks
  • Learn about epochs, batch processing, and iterative learning
  • Compare resource requirements and use cases
  • Master practical implementation strategies

Training Paradigms Overview

🔵 Classical Machine Learning

  • One-shot learning: Train once on entire dataset
  • Batch processing: All data processed simultaneously
  • Direct optimization: Closed-form or iterative solutions
  • Feature engineering: Manual feature extraction
  • Deterministic: Same result every time

🟢 Neural Networks

  • Iterative learning: Multiple passes (epochs) through data
  • Mini-batch processing: Small chunks of data
  • Gradient descent: Gradual parameter updates
  • Automatic features: Learned representations
  • Stochastic: Results may vary between runs
Key Insight: Classical ML typically trains once and is done, while neural networks require multiple iterations through the data to gradually learn complex patterns.

Training Flow Comparison

Classical ML: Data → Feature Engineering → Algorithm → Model ✅
Neural Networks: Data → Batch 1 → Update → Batch 2 → Update → ... → Epoch 1 → Epoch 2 → ... → Model ✅

Classical ML Training Process

🎯 One-Shot Learning

Classical algorithms typically process the entire dataset at once to find the optimal solution.

# Classical ML Training Examples # 1. Linear Regression - Closed Form Solution from sklearn.linear_model import LinearRegression import numpy as np X_train = np.random.randn(1000, 5) # All training data y_train = np.random.randn(1000) model = LinearRegression() model.fit(X_train, y_train) # Single training step - DONE! # 2. SVM - Iterative but still batch processing from sklearn.svm import SVC svm = SVC() svm.fit(X_train, y_train) # Processes entire dataset # 3. Random Forest - Ensemble of decision trees from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(n_estimators=100) rf.fit(X_train, y_train) # Builds all trees using full dataset

✅ Advantages

  • Simple training process
  • Fast training (usually)
  • Deterministic results
  • No hyperparameter tuning for epochs
  • Memory efficient for small datasets

❌ Limitations

  • Limited to engineered features
  • Cannot handle very large datasets
  • Less flexible for complex patterns
  • Requires all data in memory
  • Limited scalability

Neural Network Training Process

🔄 Iterative Learning with Epochs

Neural networks learn through multiple complete passes (epochs) through the training data, updating weights incrementally.

# Neural Network Training with Epochs import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, TensorDataset # Create dataset and dataloader dataset = TensorDataset(X_tensor, y_tensor) dataloader = DataLoader(dataset, batch_size=32, shuffle=True) model = nn.Sequential( nn.Linear(5, 64), nn.ReLU(), nn.Linear(64, 1) ) optimizer = optim.Adam(model.parameters(), lr=0.001) criterion = nn.MSELoss() # Training loop with epochs num_epochs = 100 for epoch in range(num_epochs): # Multiple passes through data epoch_loss = 0 for batch_X, batch_y in dataloader: # Process in small batches # Forward pass predictions = model(batch_X) loss = criterion(predictions, batch_y) # Backward pass and update optimizer.zero_grad() loss.backward() optimizer.step() epoch_loss += loss.item() if epoch % 10 == 0: print(f'Epoch {epoch}, Loss: {epoch_loss/len(dataloader):.4f}')

What are Epochs?

📚 Epoch Definition

An epoch is one complete pass through the entire training dataset. Neural networks typically require many epochs to converge.

1
Epoch
1000
Samples
32
Batch Size
32
Batches per Epoch

Epoch Progression:

# Epoch 1: Model sees all 1000 samples # Batch 1: samples 0-31 → update weights # Batch 2: samples 32-63 → update weights # ... # Batch 32: samples 992-999 → update weights # END OF EPOCH 1 # Epoch 2: Model sees all 1000 samples AGAIN # (shuffled order) # Batch 1: samples 234-265 → update weights # Batch 2: samples 67-98 → update weights # ... # END OF EPOCH 2 # Continue for 50-100+ epochs until convergence
Why Multiple Epochs? Neural networks learn gradually. Each epoch allows the model to refine its understanding of the data patterns. Early epochs learn basic patterns, later epochs fine-tune complex relationships.

Batch Size in Neural Networks

📦 Batch Size Definition

Number of samples processed together before updating model weights. Critical hyperparameter affecting training dynamics.

Batch Size Description Memory Usage Training Speed Gradient Quality Generalization
Small (1-32) Stochastic/Mini-batch Low Fast per update Noisy gradients Better
Medium (32-512) Mini-batch (common) Moderate Balanced Stable gradients Good
Large (512+) Large batch High Slow per update Smooth gradients May overfit
Full Dataset Batch gradient descent Very High Very slow Perfect gradients Often poor
# Batch Size Examples import torch from torch.utils.data import DataLoader dataset = TensorDataset(X, y) # Small batch - more updates per epoch, noisier gradients small_loader = DataLoader(dataset, batch_size=16, shuffle=True) # Medium batch - balanced approach (most common) medium_loader = DataLoader(dataset, batch_size=64, shuffle=True) # Large batch - fewer updates, smoother gradients large_loader = DataLoader(dataset, batch_size=256, shuffle=True) # Impact on training: # Small batch: 1000/16 = 62.5 updates per epoch # Medium batch: 1000/64 = 15.6 updates per epoch # Large batch: 1000/256 = 3.9 updates per epoch

CNN Training Specifics

🖼️ Convolutional Neural Networks

CNNs have additional considerations due to spatial data and memory requirements for feature maps.

🔵 Classical Computer Vision

  • Hand-crafted features (HOG, SIFT)
  • Fixed feature extraction
  • Train classifier on features
  • Process images one at a time

🟢 CNN Training

  • Learned hierarchical features
  • Spatial weight sharing
  • Backpropagation through conv layers
  • Batch processing with 4D tensors
# CNN Training Example import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self): super().__init__() self.conv_layers = nn.Sequential( nn.Conv2d(3, 32, 3, padding=1), # Learn 32 filters nn.ReLU(), nn.MaxPool2d(2), nn.Conv2d(32, 64, 3, padding=1), # Learn 64 filters nn.ReLU(), nn.MaxPool2d(2), ) self.classifier = nn.Sequential( nn.Linear(64 * 8 * 8, 128), nn.ReLU(), nn.Linear(128, 10) ) def forward(self, x): # x shape: (batch_size, 3, 32, 32) - CIFAR-10 images x = self.conv_layers(x) # Learn spatial features x = x.view(x.size(0), -1) # Flatten for classifier return self.classifier(x) # Training with image batches dataloader = DataLoader(dataset, batch_size=64, shuffle=True) # Each batch: torch.Size([64, 3, 32, 32]) - 64 RGB images for epoch in range(50): # CNNs often need more epochs for batch_images, batch_labels in dataloader: # Process 64 images simultaneously outputs = cnn_model(batch_images) loss = criterion(outputs, batch_labels) # ... backprop and update
CNN Memory Consideration: Batch size limited by GPU memory. Feature maps consume significant memory, especially in early layers. Common to use smaller batches (16-64) for high-resolution images.

Training Time & Resource Comparison

🔵 Classical ML

Minutes
Training Time
CPU
Hardware
GB
Memory

🟢 Standard NN

Hours
Training Time
GPU
Hardware
10+ GB
Memory

🔮 Deep CNN

Days
Training Time
Multi-GPU
Hardware
100+ GB
Memory
# Training Time Examples (rough estimates) # Classical ML - Scikit-learn Random Forest # Dataset: 100K samples, 20 features start_time = time.time() rf = RandomForestClassifier(n_estimators=100) rf.fit(X_train, y_train) # ~30 seconds to 2 minutes print(f"Training time: {time.time() - start_time:.2f} seconds") # Neural Network - PyTorch MLP # Dataset: 100K samples, 20 features → 128 → 64 → 10 classes for epoch in range(100): # ~10-30 minutes total for batch in dataloader: # ... training code # CNN - ResNet50 on ImageNet # Dataset: 1.2M images, 1000 classes for epoch in range(90): # ~1-2 weeks on single GPU for batch in dataloader: # ... training code
Scaling Reality: Classical ML scales linearly with data size. Neural networks scale with data size × epochs × model complexity. This is why neural networks require specialized hardware and parallel processing.

When to Use Each Approach

Scenario Classical ML Neural Networks CNNs
Small Dataset ✅ Preferred ⚠️ Risk of overfitting ❌ Likely to overfit
Tabular Data ✅ Excellent choice ⚠️ Can work well ❌ Not suitable
Image Classification ⚠️ Limited accuracy ✅ Good for simple tasks ✅ State-of-the-art
Large Dataset ⚠️ May be slow ✅ Excellent ✅ Excellent
Limited Computing Resources ✅ Very efficient ⚠️ Moderate resources ❌ High resource needs
Need Interpretability ✅ Highly interpretable ⚠️ Limited interpretability ❌ Black box
Quick Prototyping ✅ Very fast ⚠️ Moderate time ❌ Time consuming
Rule of Thumb:
  • Classical ML: Start here for structured/tabular data, small datasets, or when you need interpretability
  • Neural Networks: Use for complex patterns, large datasets, or when classical ML plateaus
  • CNNs: Essential for computer vision tasks, especially with large image datasets

Key Takeaways

🎯 Training Paradigms

Classical ML

  • One-shot learning
  • Full batch processing
  • Deterministic training
  • Feature engineering required

Neural Networks

  • Iterative learning with epochs
  • Mini-batch processing
  • Gradient-based optimization
  • Automatic feature learning

⚡ Performance & Resources

Training Time

  • Classical ML: Minutes to hours
  • Neural Networks: Hours to days
  • CNNs: Days to weeks

Resource Requirements

  • Classical ML: CPU, low memory
  • Neural Networks: GPU recommended
  • CNNs: High-end GPU required

🎓 Learning Path Recommendation

  1. Start with Classical ML: Understand basic concepts, feature engineering, and model evaluation
  2. Progress to Neural Networks: Learn about epochs, batch processing, and gradient descent
  3. Specialize in CNNs: Master computer vision and spatial data processing
  4. Advanced Topics: Explore transformers, reinforcement learning, and generative models

Quick Knowledge Check

🤔 Test Your Understanding

Answer these questions to reinforce your learning:

Question 1: Training Process

Which approach processes the entire dataset at once?

  • A) Neural Networks
  • B) Classical ML
  • C) Both approaches
  • D) Neither approach
Answer: B) Classical ML
Classical ML typically processes all data in one batch, while neural networks use mini-batches over multiple epochs.

Question 2: Epochs

What is an epoch in neural network training?

  • A) One weight update
  • B) One complete pass through the dataset
  • C) One batch processed
  • D) One layer of the network
Answer: B) One complete pass through the dataset
An epoch means the model has seen every training sample exactly once.

Practical Implementation Tips

🔵 Classical ML Best Practices

Data Preparation

  • Handle missing values appropriately
  • Scale/normalize numerical features
  • Encode categorical variables
  • Remove outliers if necessary

Model Selection

  • Start with simple models (linear/logistic regression)
  • Use cross-validation for evaluation
  • Try ensemble methods for better performance
  • Consider interpretability requirements

🟢 Neural Network Best Practices

Architecture Design

  • Start with simple architectures
  • Use appropriate activation functions
  • Include dropout for regularization
  • Monitor training/validation curves

Training Strategy

  • Choose appropriate learning rate
  • Use learning rate scheduling
  • Implement early stopping
  • Save best model checkpoints

🎯 Key Success Factors

  • Data Quality: Garbage in, garbage out - clean your data thoroughly
  • Feature Engineering: Critical for classical ML, less important for deep learning
  • Hyperparameter Tuning: Use grid search, random search, or Bayesian optimization
  • Evaluation: Use appropriate metrics and validation strategies
  • Monitoring: Track training progress and watch for overfitting

© 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin

Interactive slides designed for enhanced learning experience