Top-K Sampling | HyperKit.ai

Definition

Top-K sampling restricts the model's next-token selection to only the K most probable tokens. All other tokens are excluded before sampling, preventing the selection of unlikely (potentially nonsensical) tokens.

Lower K values produce more focused, predictable text; higher K values allow more diversity and creativity.

Key Concepts

K parameter: Number of top tokens to consider (e.g., 40, 50)
Probability redistribution: Remaining mass redistributed among top-K
Fixed cutoff: Always exactly K options, regardless of distribution
Diversity control: Balances creativity vs. coherence

Examples

Visualization

How Top-K Works

Context: "The cat sat on the ___"

Token probabilities (before top-K):
mat    ████████████████ 35%
floor  ████████████     25%
couch  ████████         15%
bed    ██████           10%
roof   ████              5%
table  ███               4%
hat    ██                2%
moon   █                 1%
pizza  ▏                 0.5%
...    ▏                 ...

With K=4 (only top 4 tokens):
mat    ████████████████████ 41% (35/85)
floor  ███████████████      29% (25/85)
couch  ██████████           18% (15/85)
bed    ████████             12% (10/85)
[all others = 0%]

Probabilities renormalized to sum to 100%
among only the top 4 candidates!

API Usage

Setting Top-K

# OpenAI (uses logit_bias or not directly supported)
# Claude API
response = anthropic.messages.create(
    model="claude-3-opus",
    max_tokens=100,
    top_k=40,  # Consider top 40 tokens
    messages=[{"role": "user", "content": "Write a poem"}]
)

# HuggingFace
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")
output = generator(
    "The future of AI",
    top_k=50,           # Top 50 tokens
    do_sample=True,
    max_length=100
)

# Common top_k values:
# K=1: Greedy (deterministic)
# K=10-20: Very focused
# K=40-50: Balanced (common default)
# K=100+: More diverse/creative

Interactive Exercise

✎

Choose the Right K

What K value would you use for each scenario?

1. Generating legal contract text
2. Creative story writing
3. Code completion

Pro Tips

Top-K is often combined with temperature for finer control
Consider top-P (nucleus) sampling as an adaptive alternative
K=1 is equivalent to greedy decoding (always pick best)
Very high K with low temperature ≈ lower K with high temp

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms