Beam Search | HyperKit.ai

Definition

Beam search is a decoding algorithm that keeps track of the k most probable partial sequences (beams) at each generation step, rather than just the single best option. It balances between greedy search (k=1) and exhaustive search.

Beam search is commonly used for translation and summarization where output quality matters more than diversity.

Key Concepts

Beam width (k): Number of candidates kept at each step
Log probability: Sequences scored by sum of log probs
Length penalty: Prevents preference for short sequences
Early stopping: Stop when k beams reach end token

Examples

Algorithm

Beam Search Visualization (k=2)

BEAM SEARCH WITH WIDTH 2:

Start: "The"
                    ┌─────────────────┐
Step 1: P(token | "The")
        "cat" (0.3)  ←── keep
        "dog" (0.25) ←── keep
        "quick" (0.1) ── drop

Beams: ["The cat", "The dog"]

Step 2: Expand BOTH beams:
        "The cat sat" (0.3 × 0.4 = 0.12) ←── keep
        "The cat is" (0.3 × 0.3 = 0.09)
        "The dog ran" (0.25 × 0.35 = 0.087)
        "The dog is" (0.25 × 0.25 = 0.062) ←── keep (2nd best)

Beams: ["The cat sat", "The dog is"]

Step 3: Continue expanding...
        ...until EOS or max length

GREEDY (k=1) vs BEAM (k>1):
┌─────────────┬──────────────────────────┐
│ Greedy      │ "The cat sat quietly."   │
│ Beam (k=2)  │ "The cat sat on the mat."│
│ Beam (k=5)  │ "The cat sat by the fire"│
└─────────────┴──────────────────────────┘

Beam search finds better global solutions!

Implementation

Beam Search Parameters

BEAM SEARCH CONFIGURATION:

# Basic beam search
output = model.generate(
    input_ids,
    num_beams=5,           # beam width
    early_stopping=True,   # stop when all beams finish
    max_length=100
)

LENGTH PENALTY:
score = log_prob / length^α

α = 0: No penalty (prefers short)
α = 1: Linear penalty (neutral)
α > 1: Prefers longer sequences

# With length penalty
output = model.generate(
    input_ids,
    num_beams=5,
    length_penalty=1.2,    # favor longer
    no_repeat_ngram_size=3 # avoid repetition
)

DIVERSITY (diverse beam search):
output = model.generate(
    input_ids,
    num_beams=6,
    num_beam_groups=3,     # 3 groups of 2
    diversity_penalty=0.5  # penalize similar beams
)

WHEN TO USE BEAM SEARCH:
✓ Machine translation
✓ Summarization
✓ Code generation (structured output)
✗ Creative writing (too deterministic)
✗ Chatbots (lacks diversity)

Interactive Exercise

✎

Calculate Beam Candidates

With beam width k=3 and vocabulary size 50,000, how many candidates are evaluated at each step before pruning to k?

Pro Tips

Beam width 4-5 is usually sufficient; larger rarely helps
Beam search can produce repetitive outputs without n-gram blocking
For chatbots, sampling methods often outperform beam search
Contrastive search combines beam search quality with diversity

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms