🚀 Optimization Algorithms

📉

Gradient Descent

Iterative method to minimize error by adjusting model parameters. Moves in the direction that reduces the error the most (downhill).

Steady descent path - consistent but can be slow

✓
Simple and reliable
â–ŗ
Can be slow on flat surfaces
⚠
May get stuck in local minima
⚡

Momentum

Adds inertia to Gradient Descent for faster convergence. Helps avoid getting stuck in small dips by carrying momentum from previous steps.

Accelerated path - builds speed and overshoots valleys

✓
Faster convergence
✓
Escapes local minima
⚠
Can overshoot optimum
📊

RMSprop

Adjusts the learning rate based on the steepness of the error surface. Takes smaller steps on steep slopes and larger steps on shallow slopes.

Adaptive path - adjusts step size based on terrain

✓
Adaptive learning rate
✓
Handles different scales well
â–ŗ
Learning rate can decay too fast
đŸŽ¯

Adam

Combines Momentum and RMSprop for efficient and stable learning. Adjusts step sizes and remembers past movements for smarter updates.

Optimal path - combines speed and adaptivity

✓
Best of both worlds
✓
Robust and efficient
✓
Most popular choice

📈 Performance Comparison

How each algorithm performs across different criteria

Gradient Descent

Speed:
★ ★ ★ ★ ★
Stability:
★ ★ ★ ★ ★

Momentum

Speed:
★ ★ ★ ★ ★
Stability:
★ ★ ★ ★ ★

RMSprop

Speed:
★ ★ ★ ★ ★
Stability:
★ ★ ★ ★ ★

Adam

Speed:
★ ★ ★ ★ ★
Stability:
★ ★ ★ ★ ★

đŸ”ĸ Mathematical Formulations

Gradient Descent

θ = θ - α∇J(θ)

Simple parameter update with learning rate Îą

Momentum

v = βv + α∇J(θ)
θ = θ - v

Adds velocity term with momentum β

RMSprop

E[g²] = βE[g²] + (1-β)g²
θ = θ - ι¡g/√(E[g²] + Îĩ)

Adapts learning rate based on gradient magnitude

Adam

m = β₁m + (1-β₁)g
v = β₂v + (1-β₂)g²
θ = θ - ι¡mĖ‚/√(vĖ‚ + Îĩ)

Combines momentum and adaptive learning rates

Š 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin

Interactive slides designed for enhanced learning experience