Top-K sampling restricts the model's next-token selection to only the K most probable tokens. All other tokens are excluded before sampling, preventing the selection of unlikely (potentially nonsensical) tokens.
Lower K values produce more focused, predictable text; higher K values allow more diversity and creativity.