Model Parameters & Configuration / Context & Memory

Context Window

Beginner [2/5]
Context length Context size Token limit

Definition

The context window is the maximum amount of text (measured in tokens) that an LLM can process at once. It includes both your input prompt and the model's response. Think of it as the model's "working memory"—everything outside this window is invisible to the model.

Modern models range from 4K to 200K+ tokens, enabling processing of entire books or codebases in a single request.

Key Concepts

  • Token limit: Maximum tokens for input + output combined
  • Context = memory: Model only "remembers" what's in the window
  • Truncation: Older content gets cut off when limit is exceeded
  • Cost implications: Larger contexts often cost more to process

Examples

Context Window Sizes
Model Comparison
GPT-3.5: 4,096 tokens (~3,000 words) GPT-4: 8,192 tokens (~6,000 words) GPT-4-32k: 32,768 tokens (~24,000 words) Claude 2: 100,000 tokens (~75,000 words) Claude 3: 200,000 tokens (~150,000 words) Gemini 1.5: 1,000,000 tokens (~750,000 words) 1 token ≈ 0.75 words (English average)
Different models offer vastly different context sizes for different use cases.
Context Usage
What Fits in Different Windows
4K tokens: ✓ Short conversations ✓ Brief documents ✗ Long articles 32K tokens: ✓ Research papers ✓ Long conversations ✓ Multiple documents 100K+ tokens: ✓ Entire books ✓ Full codebases ✓ Extensive documentation
Choose your model based on how much context you need to process.
Context Management
Handling Long Conversations
Problem: Conversation exceeds context window Solutions: 1. SUMMARIZATION Summarize older messages, keep recent ones 2. SLIDING WINDOW Keep only the most recent N messages 3. SMART RETRIEVAL (RAG) Store history externally, retrieve relevant parts 4. COMPRESSION Remove redundant information, keep key facts
Multiple strategies exist for managing conversations that exceed context limits.

Interactive Exercise

📊
Calculate Context Usage

You're building a chatbot with an 8K token context window. Calculate if these will fit:

1. System prompt: ~500 tokens

2. User conversation history: ~2,000 tokens

3. Retrieved documents (RAG): ~3,000 tokens

4. Reserved for response: ~1,000 tokens

Will everything fit? What would you do if it doesn't?

Pro Tips
  • Always reserve tokens for the model's response
  • System prompts count against your context budget
  • Use summarization to preserve important context
  • Consider cost vs. capability when choosing context size

Related Terms