Generation Parameters / Sampling

Frequency Penalty

Beginner [2/5]
Repetition penalty Token frequency control Anti-repetition

Definition

Frequency Penalty is a parameter that reduces the likelihood of generating tokens that have already appeared in the output, with the penalty increasing based on how many times the token has appeared. This helps prevent repetitive text generation.

Unlike presence penalty (which applies equally regardless of count), frequency penalty scales with the number of occurrences, more strongly discouraging tokens that appear many times.

Key Concepts

  • Proportional penalty: More occurrences = stronger penalty
  • Logit modification: Applied before softmax sampling
  • Range: Typically 0 to 2 (0 = no penalty)
  • Repetition reduction: Encourages vocabulary diversity

Examples

Comparison
Frequency vs Presence Penalty
FREQUENCY PENALTY vs PRESENCE PENALTY: Token "the" has appeared 5 times in output: PRESENCE PENALTY (binary): penalty = presence_penalty × 1 (just "is present") → Same penalty whether appeared 1 or 100 times FREQUENCY PENALTY (proportional): penalty = frequency_penalty × 5 (times count) → Penalty grows with each occurrence --- FORMULA: logit_adjusted = logit - (freq_penalty × count) Where: - logit = original score for token - freq_penalty = parameter (0 to 2) - count = times token appeared so far EXAMPLE: Token "amazing" has appeared 3 times Original logit: 2.5 Frequency penalty: 0.5 Adjusted logit = 2.5 - (0.5 × 3) = 1.0 Token is now less likely to be selected! --- VISUAL COMPARISON: Occurrences: 1 2 3 4 5 │ │ │ │ │ Presence: ████ ████ ████ ████ ████ (constant) Frequency: ██ ████ ██████ ████████ ██████████ (grows with count) --- COMBINED FORMULA (OpenAI-style): logit -= presence_penalty × (1 if count > 0 else 0) logit -= frequency_penalty × count Example with both (presence=0.5, frequency=0.3): Token appeared 4 times: - Presence penalty: 0.5 × 1 = 0.5 - Frequency penalty: 0.3 × 4 = 1.2 - Total reduction: 1.7 Both can be used together for nuanced control.
Effects
Frequency Penalty Impact on Output
EFFECT OF DIFFERENT FREQUENCY PENALTY VALUES: Prompt: "Write about the importance of sleep" FREQUENCY PENALTY = 0 (no penalty): "Sleep is important for health. Sleep helps the body recover. Sleep is essential for memory. Sleep affects mood. Sleep is necessary for..." → Very repetitive, same words over and over FREQUENCY PENALTY = 0.5 (light): "Sleep is important for health. Rest helps the body recover. Slumber is essential for memory consolidation. Quality rest affects mood and cognitive function. Adequate sleep is..." → Some variety, occasional repetition FREQUENCY PENALTY = 1.0 (moderate): "Sleep is crucial for health. Rest enables physical recovery. Slumber consolidates memories effectively. Quality downtime affects mood, cognition, and emotional regulation. Adequate nightly rest proves necessary for..." → Good variety, uses synonyms naturally FREQUENCY PENALTY = 2.0 (high): "Sleep proves crucial healthwise. Resting enables physical recuperation. Slumbering consolidates memories effectively. Qualitative downtime impacts moods, cognition, emotional stability differently. Nocturnal unconsciousness reveals necessities..." → Forced variety, may sound unnatural --- RECOMMENDED VALUES BY USE CASE: ┌─────────────────────────┬───────────────────────┐ │ Use Case │ Frequency Penalty │ ├─────────────────────────┼───────────────────────┤ │ Code generation │ 0.0 - 0.3 (low) │ │ Technical writing │ 0.3 - 0.5 (low-med) │ │ Creative writing │ 0.5 - 0.8 (moderate) │ │ Poetry/lyrics │ 0.3 - 0.6 (moderate) │ │ Brainstorming │ 0.8 - 1.2 (higher) │ │ Diverse idea generation │ 1.0 - 1.5 (high) │ └─────────────────────────┴───────────────────────┘ CODE GENERATION NOTE: Keep frequency_penalty LOW for code! Repetition is often necessary: - Variable names must repeat - Keywords (if, for, return) repeat naturally - Function calls may repeat High penalty → broken code with random synonyms!

Interactive Exercise

Choose the Right Setting

You're building three different applications. What frequency_penalty would you set for each?

1. A Python code assistant
2. A marketing slogan generator (needs variety)
3. A legal document summarizer (precision needed)

Pro Tips
  • Start with 0.5 and adjust based on output quality
  • Keep low (0-0.3) for code and technical content
  • Combine with presence_penalty for finer control
  • Higher values may cause incoherent text - test thoroughly

Related Terms