Logits | HyperKit.ai

Definition

Logits are the raw, unnormalized output scores from a neural network before applying softmax. In language models, logits represent the model's "confidence" for each possible next token—higher logits mean the model considers that token more likely.

Logits are converted to probabilities via the softmax function for sampling or loss calculation.

Key Concepts

Unnormalized: Logits can be any real number (positive or negative)
Relative values: Only differences between logits matter
Softmax conversion: logits → probabilities that sum to 1
Temperature scaling: Divide logits by temperature before softmax

Examples

Transformation

Logits to Probabilities

Model output for "The cat sat on the ___"

LOGITS (raw scores):
"mat"     →  3.2
"floor"   →  2.5
"couch"   →  1.8
"roof"    →  0.5
"moon"    → -2.0

SOFTMAX CONVERSION:
P(token) = exp(logit) / Σexp(all logits)

"mat"   → exp(3.2) / sum = 0.48  (48%)
"floor" → exp(2.5) / sum = 0.24  (24%)
"couch" → exp(1.8) / sum = 0.12  (12%)
"roof"  → exp(0.5) / sum = 0.03  (3%)
"moon"  → exp(-2.0)/ sum = 0.003 (0.3%)
                           ─────
                    Total: 1.0 (100%)

Note: Higher logit → higher probability
The exponential amplifies differences!

API Access

Getting Logits from APIs

# OpenAI - get log probabilities
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    logprobs=True,        # Enable log prob output
    top_logprobs=5        # Return top 5 alternatives
)

# Access log probabilities
for token in response.choices[0].logprobs.content:
    print(f"Token: {token.token}")
    print(f"Log prob: {token.logprob}")
    for alt in token.top_logprobs:
        print(f"  Alt: {alt.token} = {alt.logprob}")

# Note: APIs return log-probabilities (log of softmax output)
# not raw logits, for numerical stability
# log_prob = log(softmax(logit))

# To manipulate logits directly, use logit_bias:
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    logit_bias={
        "1234": 10,   # Boost token 1234
        "5678": -100  # Suppress token 5678
    }
)

Interactive Exercise

✎

Understand Logit Relationships

If token A has logit 5.0 and token B has logit 3.0, which has higher probability and roughly by how much?

Pro Tips

Logit differences of ~2-3 mean ~10x probability difference
Use logit_bias to steer generation without fine-tuning
Log-probs are more numerically stable than raw probs
Temperature divides logits, flattening or sharpening distribution

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms