Model Parameters / Core Concepts

Sampling

Beginner [2/5]
Token sampling Stochastic decoding Random sampling

Definition

Sampling is the process of selecting the next token from a probability distribution during text generation. Instead of always picking the most likely token (greedy), sampling introduces randomness based on the probability weights.

Different sampling strategies (top-k, top-p, temperature) control how this randomness is applied to balance creativity and coherence.

Key Concepts

  • Probability distribution: Model outputs probabilities for all possible next tokens
  • Stochastic selection: Random choice weighted by probabilities
  • Diversity vs. quality: More randomness = more creative but potentially less coherent
  • Reproducibility: Same seed + parameters = same output

Examples

Concept
Sampling from Distribution
Token probability distribution: "the" → 0.35 ████████████████ "a" → 0.25 ████████████ "my" → 0.15 ███████ "your" → 0.10 █████ "his" → 0.08 ████ "her" → 0.05 ██ "our" → 0.02 █ GREEDY (no sampling): Always picks "the" (highest probability) → Deterministic, potentially repetitive SAMPLING: Randomly selects based on weights Run 1: "a" (25% chance) Run 2: "the" (35% chance) Run 3: "my" (15% chance) → Varied, more natural text Think of it like a weighted lottery: "the" has 35 tickets, "a" has 25, etc. Each generation draws one ticket randomly.
Implementation
Basic Sampling in Python
import numpy as np def sample_token(logits, temperature=1.0): """Sample next token from logits with temperature.""" # Apply temperature scaling scaled_logits = logits / temperature # Convert to probabilities (softmax) exp_logits = np.exp(scaled_logits - np.max(scaled_logits)) probs = exp_logits / np.sum(exp_logits) # Sample from distribution token_id = np.random.choice(len(probs), p=probs) return token_id # With APIs: # OpenAI response = client.chat.completions.create( model="gpt-4", messages=[...], temperature=0.7 # Controls sampling randomness ) # Claude response = anthropic.messages.create( model="claude-3-opus", temperature=0.7, # 0 = greedy, 1 = full sampling messages=[...] )

Interactive Exercise

Predict Sampling Behavior

Given probabilities [A: 0.50, B: 0.30, C: 0.15, D: 0.05], if you run sampling 100 times, approximately how many times would each token be selected?

Pro Tips
  • Set temperature=0 to disable sampling (greedy decoding)
  • Use seeds for reproducible random sampling
  • Combine sampling with top-k or top-p for better control
  • Higher temperature = more uniform distribution = more randomness

Related Terms