Training / Generalization

Overfitting

Beginner [2/5]
Memorization High variance Over-training

Definition

Overfitting occurs when a model learns the training data too well, including its noise and peculiarities, failing to generalize to new data. The model essentially memorizes training examples rather than learning underlying patterns.

Overfitting is one of the fundamental challenges in machine learning—balancing between learning enough and learning too much.

Key Concepts

  • Train vs validation gap: Low training loss but high validation loss
  • Model complexity: More parameters → higher overfitting risk
  • Data size: Less data → higher overfitting risk
  • Regularization: Techniques to prevent overfitting

Examples

Visualization
Recognizing Overfitting
OVERFITTING VISUALIZATION: Training Loss vs Validation Loss over epochs: Loss ↑ │ 3 │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ ╲ 2 │ ╲___Validation loss │ ╲ plateaus, then rises! 1 │ ╲______╱──────────── │ ╲ │ ╲___Training loss 0 │ ╲____________ keeps falling └─────────────────────────────→ Epochs ↑ Overfitting starts here! WHAT'S HAPPENING: - Model keeps improving on training data - But performance on new data gets WORSE - Model is memorizing, not learning UNDERFITTING GOOD FIT OVERFITTING │ │ │ ● ● ● ● ● ● ● ● ● ─────── ╱╲ ╱╲ ╱╲ ╱╲ too simple just right too complex
Solutions
Preventing Overfitting
METHODS TO PREVENT OVERFITTING: 1. MORE DATA - Simplest and most effective - Data augmentation if real data limited 2. REGULARIZATION - L1/L2: Penalize large weights - loss = original_loss + λ × ||weights||² 3. DROPOUT - Randomly zero neurons during training - Forces redundant representations model.add(Dropout(0.5)) 4. EARLY STOPPING - Stop when validation loss stops improving - Save best checkpoint 5. REDUCE MODEL SIZE - Fewer layers/parameters - Smaller embedding dimensions 6. WEIGHT DECAY (AdamW) optimizer = AdamW(params, weight_decay=0.01) 7. DATA AUGMENTATION - For text: back-translation, paraphrasing - For images: rotation, flipping, cropping # Early stopping example best_val_loss = float('inf') patience = 3 no_improve_count = 0 for epoch in range(100): train_loss = train() val_loss = validate() if val_loss < best_val_loss: best_val_loss = val_loss save_checkpoint(model) no_improve_count = 0 else: no_improve_count += 1 if no_improve_count >= patience: print("Early stopping!") break

Interactive Exercise

Identify the Problem

After 10 epochs: Train accuracy = 99%, Validation accuracy = 72%. After 50 epochs: Train = 99.9%, Val = 65%. What's happening and what would you do?

Pro Tips
  • LLMs overfit less due to massive training data and regularization
  • Fine-tuning small datasets is high overfitting risk → use low LR
  • Monitor validation loss, not training loss, for model selection
  • Cross-validation gives more robust generalization estimates

Related Terms