Perplexity measures how "surprised" a language model is by a sequence of text. Lower perplexity means the model predicts the text more confidently and accurately. It's the exponential of the average negative log-likelihood per token.
Intuitively, perplexity represents the effective number of equally likely choices the model is considering at each step.