Cross-entropy loss measures the difference between two probability distributions—the model's predicted distribution and the true distribution. It's the standard loss function for classification tasks and language modeling, heavily penalizing confident wrong predictions.
Every LLM is trained using cross-entropy loss on next-token prediction.