Context Window | HyperKit.ai

Definition

The context window is the maximum amount of text (measured in tokens) that an LLM can process at once. It includes both your input prompt and the model's response. Think of it as the model's "working memory"—everything outside this window is invisible to the model.

Modern models range from 4K to 200K+ tokens, enabling processing of entire books or codebases in a single request.

Key Concepts

Token limit: Maximum tokens for input + output combined
Context = memory: Model only "remembers" what's in the window
Truncation: Older content gets cut off when limit is exceeded
Cost implications: Larger contexts often cost more to process

Examples

Context Window Sizes

Model Comparison

GPT-3.5:     4,096 tokens  (~3,000 words)
GPT-4:       8,192 tokens  (~6,000 words)
GPT-4-32k:  32,768 tokens  (~24,000 words)
Claude 2:   100,000 tokens (~75,000 words)
Claude 3:   200,000 tokens (~150,000 words)
Gemini 1.5: 1,000,000 tokens (~750,000 words)

1 token ≈ 0.75 words (English average)

Different models offer vastly different context sizes for different use cases.

Context Usage

What Fits in Different Windows

4K tokens:
  ✓ Short conversations
  ✓ Brief documents
  ✗ Long articles

32K tokens:
  ✓ Research papers
  ✓ Long conversations
  ✓ Multiple documents

100K+ tokens:
  ✓ Entire books
  ✓ Full codebases
  ✓ Extensive documentation

Choose your model based on how much context you need to process.

Context Management

Handling Long Conversations

Problem: Conversation exceeds context window

Solutions:

1. SUMMARIZATION
   Summarize older messages, keep recent ones

2. SLIDING WINDOW
   Keep only the most recent N messages

3. SMART RETRIEVAL (RAG)
   Store history externally, retrieve relevant parts

4. COMPRESSION
   Remove redundant information, keep key facts

Multiple strategies exist for managing conversations that exceed context limits.

Interactive Exercise

📊

Calculate Context Usage

You're building a chatbot with an 8K token context window. Calculate if these will fit:

1. System prompt: ~500 tokens

2. User conversation history: ~2,000 tokens

3. Retrieved documents (RAG): ~3,000 tokens

4. Reserved for response: ~1,000 tokens

Will everything fit? What would you do if it doesn't?

Pro Tips

Always reserve tokens for the model's response
System prompts count against your context budget
Use summarization to preserve important context
Consider cost vs. capability when choosing context size

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms