Model Internals / Core Concepts

Logits

Intermediate [3/5]
Raw scores Unnormalized log-probabilities Pre-softmax values

Definition

Logits are the raw, unnormalized output scores from a neural network before applying softmax. In language models, logits represent the model's "confidence" for each possible next token—higher logits mean the model considers that token more likely.

Logits are converted to probabilities via the softmax function for sampling or loss calculation.

Key Concepts

  • Unnormalized: Logits can be any real number (positive or negative)
  • Relative values: Only differences between logits matter
  • Softmax conversion: logits → probabilities that sum to 1
  • Temperature scaling: Divide logits by temperature before softmax

Examples

Transformation
Logits to Probabilities
Model output for "The cat sat on the ___" LOGITS (raw scores): "mat" → 3.2 "floor" → 2.5 "couch" → 1.8 "roof" → 0.5 "moon" → -2.0 SOFTMAX CONVERSION: P(token) = exp(logit) / Σexp(all logits) "mat" → exp(3.2) / sum = 0.48 (48%) "floor" → exp(2.5) / sum = 0.24 (24%) "couch" → exp(1.8) / sum = 0.12 (12%) "roof" → exp(0.5) / sum = 0.03 (3%) "moon" → exp(-2.0)/ sum = 0.003 (0.3%) ───── Total: 1.0 (100%) Note: Higher logit → higher probability The exponential amplifies differences!
API Access
Getting Logits from APIs
# OpenAI - get log probabilities response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}], logprobs=True, # Enable log prob output top_logprobs=5 # Return top 5 alternatives ) # Access log probabilities for token in response.choices[0].logprobs.content: print(f"Token: {token.token}") print(f"Log prob: {token.logprob}") for alt in token.top_logprobs: print(f" Alt: {alt.token} = {alt.logprob}") # Note: APIs return log-probabilities (log of softmax output) # not raw logits, for numerical stability # log_prob = log(softmax(logit)) # To manipulate logits directly, use logit_bias: response = client.chat.completions.create( model="gpt-4", messages=[...], logit_bias={ "1234": 10, # Boost token 1234 "5678": -100 # Suppress token 5678 } )

Interactive Exercise

Understand Logit Relationships

If token A has logit 5.0 and token B has logit 3.0, which has higher probability and roughly by how much?

Pro Tips
  • Logit differences of ~2-3 mean ~10x probability difference
  • Use logit_bias to steer generation without fine-tuning
  • Log-probs are more numerically stable than raw probs
  • Temperature divides logits, flattening or sharpening distribution

Related Terms