Sampling | HyperKit.ai

Definition

Sampling is the process of selecting the next token from a probability distribution during text generation. Instead of always picking the most likely token (greedy), sampling introduces randomness based on the probability weights.

Different sampling strategies (top-k, top-p, temperature) control how this randomness is applied to balance creativity and coherence.

Key Concepts

Probability distribution: Model outputs probabilities for all possible next tokens
Stochastic selection: Random choice weighted by probabilities
Diversity vs. quality: More randomness = more creative but potentially less coherent
Reproducibility: Same seed + parameters = same output

Examples

Concept

Sampling from Distribution

Token probability distribution:
"the"    → 0.35  ████████████████
"a"      → 0.25  ████████████
"my"     → 0.15  ███████
"your"   → 0.10  █████
"his"    → 0.08  ████
"her"    → 0.05  ██
"our"    → 0.02  █

GREEDY (no sampling):
Always picks "the" (highest probability)
→ Deterministic, potentially repetitive

SAMPLING:
Randomly selects based on weights
Run 1: "a" (25% chance)
Run 2: "the" (35% chance)
Run 3: "my" (15% chance)
→ Varied, more natural text

Think of it like a weighted lottery:
"the" has 35 tickets, "a" has 25, etc.
Each generation draws one ticket randomly.

Implementation

Basic Sampling in Python

import numpy as np

def sample_token(logits, temperature=1.0):
    """Sample next token from logits with temperature."""
    # Apply temperature scaling
    scaled_logits = logits / temperature

    # Convert to probabilities (softmax)
    exp_logits = np.exp(scaled_logits - np.max(scaled_logits))
    probs = exp_logits / np.sum(exp_logits)

    # Sample from distribution
    token_id = np.random.choice(len(probs), p=probs)
    return token_id

# With APIs:
# OpenAI
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    temperature=0.7  # Controls sampling randomness
)

# Claude
response = anthropic.messages.create(
    model="claude-3-opus",
    temperature=0.7,  # 0 = greedy, 1 = full sampling
    messages=[...]
)

Interactive Exercise

✎

Predict Sampling Behavior

Given probabilities [A: 0.50, B: 0.30, C: 0.15, D: 0.05], if you run sampling 100 times, approximately how many times would each token be selected?

Pro Tips

Set temperature=0 to disable sampling (greedy decoding)
Use seeds for reproducible random sampling
Combine sampling with top-k or top-p for better control
Higher temperature = more uniform distribution = more randomness

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms