Prompt caching stores frequently used prompt segments for efficiency. When you have large static context (like documentation or system prompts) that's reused across many requests, caching allows the model to skip reprocessing this content.
This technique reduces latency and costs by avoiding redundant computation on identical prompt prefixes.