Overfitting | HyperKit.ai

Definition

Overfitting occurs when a model learns the training data too well, including its noise and peculiarities, failing to generalize to new data. The model essentially memorizes training examples rather than learning underlying patterns.

Overfitting is one of the fundamental challenges in machine learning—balancing between learning enough and learning too much.

Key Concepts

Train vs validation gap: Low training loss but high validation loss
Model complexity: More parameters → higher overfitting risk
Data size: Less data → higher overfitting risk
Regularization: Techniques to prevent overfitting

Examples

Visualization

Recognizing Overfitting

OVERFITTING VISUALIZATION:

Training Loss vs Validation Loss over epochs:

Loss ↑
     │
  3  │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
     │ ╲
  2  │   ╲___Validation loss
     │      ╲ plateaus, then rises!
  1  │       ╲______╱────────────
     │         ╲
     │          ╲___Training loss
  0  │              ╲____________ keeps falling
     └─────────────────────────────→ Epochs
              ↑
         Overfitting starts here!

WHAT'S HAPPENING:
- Model keeps improving on training data
- But performance on new data gets WORSE
- Model is memorizing, not learning

UNDERFITTING    GOOD FIT        OVERFITTING
    │               │               │
    ●   ●   ●       ●   ●   ●       ●   ●   ●
    ───────         ╱╲              ╱╲ ╱╲ ╱╲
   too simple     just right      too complex

Solutions

Preventing Overfitting

METHODS TO PREVENT OVERFITTING:

1. MORE DATA
   - Simplest and most effective
   - Data augmentation if real data limited

2. REGULARIZATION
   - L1/L2: Penalize large weights
   - loss = original_loss + λ × ||weights||²

3. DROPOUT
   - Randomly zero neurons during training
   - Forces redundant representations
   model.add(Dropout(0.5))

4. EARLY STOPPING
   - Stop when validation loss stops improving
   - Save best checkpoint

5. REDUCE MODEL SIZE
   - Fewer layers/parameters
   - Smaller embedding dimensions

6. WEIGHT DECAY (AdamW)
   optimizer = AdamW(params, weight_decay=0.01)

7. DATA AUGMENTATION
   - For text: back-translation, paraphrasing
   - For images: rotation, flipping, cropping

# Early stopping example
best_val_loss = float('inf')
patience = 3
no_improve_count = 0

for epoch in range(100):
    train_loss = train()
    val_loss = validate()

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        save_checkpoint(model)
        no_improve_count = 0
    else:
        no_improve_count += 1
        if no_improve_count >= patience:
            print("Early stopping!")
            break

Interactive Exercise

✎

Identify the Problem

After 10 epochs: Train accuracy = 99%, Validation accuracy = 72%. After 50 epochs: Train = 99.9%, Val = 65%. What's happening and what would you do?

Pro Tips

LLMs overfit less due to massive training data and regularization
Fine-tuning small datasets is high overfitting risk → use low LR
Monitor validation loss, not training loss, for model selection
Cross-validation gives more robust generalization estimates

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms