Throughput measures the volume of work an LLM system can process over time, typically expressed as tokens per second or requests per second. High throughput is essential for serving many users efficiently and keeping infrastructure costs manageable.
Throughput and latency often have an inverse relationship - optimizing for one may hurt the other.