Repetition penalty reduces the probability of tokens that have already appeared in the generated text, discouraging the model from repeating words or phrases. This addresses a common failure mode where LLMs get stuck in repetitive loops.
Different APIs implement variations: frequency penalty (scales with count), presence penalty (binary), and n-gram blocking.