Inference / Decoding Strategies

Repetition Penalty

Foundational [2/5]
Frequency penalty Presence penalty No-repeat penalty

Definition

Repetition penalty reduces the probability of tokens that have already appeared in the generated text, discouraging the model from repeating words or phrases. This addresses a common failure mode where LLMs get stuck in repetitive loops.

Different APIs implement variations: frequency penalty (scales with count), presence penalty (binary), and n-gram blocking.

Key Concepts

  • Frequency penalty: Penalty increases with token occurrence count
  • Presence penalty: Fixed penalty if token appeared at all
  • N-gram blocking: Hard prevents repeating n-token sequences
  • Context window: How far back to check for repetition

Examples

Problem
Without Repetition Penalty
THE REPETITION PROBLEM: Prompt: "Write about AI" Without penalty (can get stuck): "AI is transforming the world. AI is changing how we work. AI is revolutionizing healthcare. AI is making things better. AI is AI is AI is AI is AI is..." WHY THIS HAPPENS: 1. "AI" has high probability given context 2. Each "AI" reinforces the pattern 3. Model enters degenerate loop 4. Especially common with: - Long generations - High temperature - Beam search - Certain topics/patterns TYPES OF REPETITION: - Word-level: "the the the" - Phrase-level: "in order to... in order to..." - Sentence-level: Repeating whole sentences - Pattern-level: Alternating A-B-A-B-A-B
Implementation
Penalty Mechanisms
FREQUENCY PENALTY (OpenAI): logit_new = logit - frequency_penalty × count(token) Token "AI" appeared 5 times: Original logit: 3.0 With freq_penalty=0.5: 3.0 - 0.5 × 5 = 0.5 PRESENCE PENALTY (OpenAI): logit_new = logit - presence_penalty × (count > 0 ? 1 : 0) Token "AI" appeared at all: Original logit: 3.0 With pres_penalty=1.0: 3.0 - 1.0 = 2.0 REPETITION PENALTY (HuggingFace): if token in previous_tokens: if logit > 0: logit = logit / repetition_penalty else: logit = logit × repetition_penalty rep_penalty=1.2: logit 3.0 → 2.5 N-GRAM BLOCKING: no_repeat_ngram_size=3 Prevents ANY 3-word sequence from repeating "the big dog" appeared → "the big dog" blocked API PARAMETERS: # OpenAI response = openai.chat.completions.create( model="gpt-4", messages=[...], frequency_penalty=0.5, # 0-2 presence_penalty=0.5 # 0-2 ) # HuggingFace output = model.generate( input_ids, repetition_penalty=1.2, # 1.0 = off no_repeat_ngram_size=3 # block 3-grams )

Interactive Exercise

Calculate Penalized Logit

Token "the" has logit 4.0 and has appeared 3 times. Calculate the new logit with frequency_penalty=0.8.

Pro Tips
  • Start with frequency_penalty=0.3-0.5 for natural text
  • Use presence_penalty for topic diversity, frequency for word diversity
  • Too high penalty causes unnatural word choices
  • Combine with temperature for best results

Related Terms