KV Cache stores the Key and Value projections from previous tokens during autoregressive generation, avoiding redundant computation. Instead of reprocessing all tokens for each new generation step, the model only computes K and V for the new token.
KV caching is essential for fast LLM inference, reducing generation from O(n²) to O(n) per token.