Cost & Loss Functions in Machine Learning

Understanding the mathematical foundation of model optimization

📊What are Cost and Loss Functions?

Cost and loss functions are mathematical measures that quantify how far off our model's predictions are from the actual values. They guide the learning process by providing a single number that represents the model's performance.

Key Concepts:

  • Loss Function: Measures error for a single training example
  • Cost Function: Average loss across all training examples
  • Objective Function: General term for what we're trying to minimize
  • Optimization Goal: Find parameters that minimize the cost function

📈Common Loss Functions

Mean Squared Error (MSE)

MSE = (1/n) × Σ(yᵢ - ŷᵢ)²

Use Case: Regression problems

Characteristics: Penalizes large errors heavily, differentiable everywhere

Pros: Smooth gradient, commonly used

Cons: Sensitive to outliers

Mean Absolute Error (MAE)

MAE = (1/n) × Σ|yᵢ - ŷᵢ|

Use Case: Regression problems

Characteristics: Linear penalty for errors

Pros: Robust to outliers

Cons: Not differentiable at zero

Cross-Entropy Loss

CE = -Σ yᵢ × log(ŷᵢ)

Use Case: Classification problems

Characteristics: Measures probability distribution difference

Pros: Good gradient properties, probabilistic interpretation

Cons: Can be unstable with extreme probabilities

Hinge Loss

Hinge = max(0, 1 - yᵢ × ŷᵢ)

Use Case: Support Vector Machines

Characteristics: Linear loss for misclassified examples

Pros: Sparse solutions, margin-based

Cons: Not differentiable at margin boundary

⚖️Loss Function Comparison

Loss Function Problem Type Sensitivity to Outliers Differentiability Computational Cost
Mean Squared Error Regression High Smooth everywhere Low
Mean Absolute Error Regression Low Not at zero Low
Cross-Entropy Classification Medium Smooth Medium
Hinge Loss Classification (SVM) Medium Not at margin Low
Huber Loss Regression Medium Smooth Medium

🔍Key Takeaways

  • Choose wisely: The choice of loss function significantly impacts model behavior
  • Consider your data: Outliers, noise, and problem type should guide your choice
  • Optimization matters: Loss functions must be optimizable (preferably differentiable)
  • Trade-offs exist: No single loss function is perfect for all scenarios
  • Custom losses: Sometimes domain-specific loss functions work better
  • Regularization: Adding regularization terms helps prevent overfitting

© 2025 Machine Learning for Health Research Course | Prof. Gennady Roshchupkin

Interactive slides designed for enhanced learning experience