🧠 Neural Network Training

Understanding Epochs, Batches & More

🎯

Master the fundamental concepts of training neural networks and CNNs

💡 Learning Objective: By the end of this presentation, you'll understand how neural networks learn through epochs, batches, and iterations!

📊 What is Training Data?

Training data is the dataset used to teach your neural network.

🐱

🐶

🐱

🐶

🐱

🐶

🐱

🐶

Example: 1000 images of cats and dogs

Size matters: More data usually means better performance
Quality counts: Clean, labeled data is crucial
Variety helps: Diverse examples improve generalization

📦 What is a Batch?

A batch is a subset of training data processed together.

Batch 1 (size=4)

1

2

3

4

Batch 2 (size=4)

5

6

7

8

Total Data = 1000 images
Batch Size = 32
Number of Batches = 1000 ÷ 32 = ~31 batches

Efficiency: Processing multiple samples together is faster
Memory: Batch size is limited by GPU/CPU memory
Common sizes: 16, 32, 64, 128, 256

🔄 What is an Iteration?

One iteration = Processing one batch through the network

Batch

1

2

3

4

→

Neural Network

→

📈 Update Weights

1 Iteration = Forward Pass + Backward Pass + Weight Update

Forward Pass: Data flows through network to get predictions
Backward Pass: Calculate gradients (how to improve)
Weight Update: Adjust network parameters to reduce error

🌍 What is an Epoch?

One epoch = The network has seen ALL training data once

Epoch 1 Complete!

Batch 1

1-32

Batch 2

33-64

...

Batch 31

969-1000

Complete cycle: All data samples have been used for training
Multiple epochs: Networks typically train for 10-100+ epochs
Learning progress: Each epoch should improve the model

⚖️ Batch Size Trade-offs

Aspect	Small Batches (8-16)	Medium Batches (32-64)	Large Batches (128+)
Memory Usage	✅ Low	⚡ Moderate	❌ High
Training Speed	❌ Slower	⚡ Balanced	✅ Faster
Gradient Quality	❌ Noisy	⚡ Good	✅ Smooth
Generalization	✅ Better	⚡ Good	❌ May overfit

💡 Sweet Spot: Batch sizes of 32-64 often work best for most problems!

🖼️ CNN Training Specifics

Conv Layer

Pool Layer

Conv Layer

Dense Layer

Image Batches: Process multiple images simultaneously
Data Augmentation: Rotate, flip, crop images each epoch
Memory Intensive: Images require more memory than text/numbers
Feature Learning: Early layers learn edges, later layers learn objects
Transfer Learning: Use pre-trained networks to save time

CNN Batch Shape: [batch_size, height, width, channels]
Example: [32, 224, 224, 3] = 32 color images of 224×224 pixels

🔀 Do We Create New Batches Each Epoch?

Great Question! It Depends...

✅ WITH Shuffling (Recommended)

Epoch 1: [A,B,C,D] [E,F,G,H] [I,J,K,L]

Epoch 2: [C,A,I,F] [B,L,E,D] [G,K,H,J]

Epoch 3: [J,B,A,K] [I,C,F,L] [H,D,E,G]

→ Better learning, avoids patterns

❌ WITHOUT Shuffling

Epoch 1: [A,B,C,D] [E,F,G,H] [I,J,K,L]

Epoch 2: [A,B,C,D] [E,F,G,H] [I,J,K,L]

Epoch 3: [A,B,C,D] [E,F,G,H] [I,J,K,L]

→ May memorize order, poor learning

Shuffling: Randomly reorder data before each epoch
Why shuffle? Prevents model from learning data order patterns
Implementation: Most frameworks do this automatically
Exception: Time series data - order matters, so no shuffling!

💡 Best Practice: Always shuffle your training data between epochs (unless working with sequential data like time series or text)!

🎯 Complete Training Process

Step 1: Load your dataset (e.g., 10,000 images)

Step 2: Set batch size (e.g., 32)

Step 3: Shuffle data and create batches (313 batches)

Step 4: Train through all batches (1 epoch complete)

Step 5: Shuffle again and repeat for 50 epochs

Result: 313 iterations × 50 epochs = 15,650 updates!

for epoch in range(50):
    shuffle(training_data) # 🔀 Key step!
    create_batches(batch_size=32)
    for batch in batches:
        train_on_batch(batch)

🎉 Congratulations! You now understand the complete training process, including the crucial role of data shuffling between epochs!