Softmax is a mathematical function that converts a vector of raw scores (logits) into a probability distribution. It exponentiates each value and normalizes so all outputs sum to 1, making them interpretable as probabilities.
In language models, softmax converts the model's confidence scores into token probabilities for sampling.