Context Engineering / Core Concepts

Context Management

Advanced [4/5]
Context engineering Context optimization Prompt state management

Definition

Context Management is the practice of strategically organizing, prioritizing, and maintaining the information provided to LLMs within their context window. Effective context management maximizes model performance while working within token limits.

This includes decisions about what to include, where to place it, when to summarize or remove information, and how to structure multi-turn conversations.

Key Concepts

  • Context window: Maximum tokens the model can process at once
  • Recency bias: Models attend more to recent content
  • Primacy effect: First content also gets attention priority
  • Lost in middle: Middle content often gets less attention

Examples

Strategy
Context Window Management Strategies
CONTEXT WINDOW ANATOMY: ┌─────────────────────────────────────────────────┐ │ SYSTEM PROMPT (High attention - first position) │ │ - Role definition │ │ - Core instructions │ │ - Output format requirements │ ├─────────────────────────────────────────────────┤ │ REFERENCE MATERIAL (Medium attention) │ │ - Retrieved documents (RAG) │ │ - Examples (few-shot) │ │ - Background information │ ├─────────────────────────────────────────────────┤ │ ⚠️ "LOST IN THE MIDDLE" ZONE │ │ - Older conversation history │ │ - Less critical context │ │ - May receive less attention │ ├─────────────────────────────────────────────────┤ │ RECENT CONTEXT (High attention - recency) │ │ - Recent conversation turns │ │ - Current task specifics │ ├─────────────────────────────────────────────────┤ │ CURRENT QUERY (Highest attention) │ │ - User's current request │ └─────────────────────────────────────────────────┘ ATTENTION PRIORITY: Start ████████████ High (primacy) Middle █████ Lower (lost in middle) End ████████████████ Highest (recency) PLACEMENT STRATEGIES: 1. CRITICAL INFO: Place at start OR end, not middle 2. EXAMPLES: Near the query for format consistency 3. LONG DOCS: Summarize middle, keep key parts at ends 4. CONVERSATION: Summarize old turns, keep recent full
Implementation
Multi-Turn Conversation Management
CONVERSATION CONTEXT STRATEGIES: 1. SLIDING WINDOW: Keep last N turns, drop oldest Turn 1: [Dropped] Turn 2: [Dropped] Turn 3: [Dropped] Turn 4: [Kept] User: "Let's discuss Python" Turn 5: [Kept] Assistant: "Sure, what aspect?" Turn 6: [Kept] User: "How do decorators work?" Turn 7: [Current response] Pros: Simple, preserves recent context Cons: Loses important early context 2. SUMMARIZATION BUFFER: Summarize old turns, keep recent full [Summary of turns 1-10]: "User asked about Python decorators. Discussed @property, @staticmethod. User understood basic syntax." Turn 11: [Full] User: "What about @classmethod?" Turn 12: [Full] Assistant: [response] Turn 13: [Current] Pros: Preserves key info from history Cons: May lose nuance in summary 3. HIERARCHICAL CONTEXT: ┌─────────────────────────────────────────┐ │ Level 1: Session summary (always kept) │ │ "Discussing Python OOP concepts" │ ├─────────────────────────────────────────┤ │ Level 2: Topic summaries (compressed) │ │ "Covered: classes, inheritance, magic │ │ methods" │ ├─────────────────────────────────────────┤ │ Level 3: Recent turns (full detail) │ │ Last 5 turns with complete content │ └─────────────────────────────────────────┘ 4. SELECTIVE RETENTION: Keep turns based on relevance to current query def select_context(history, current_query): relevant = [] for turn in history: if is_relevant(turn, current_query): relevant.append(turn) return summarize_old(relevant[:-5]) + relevant[-5:] TOKEN BUDGET ALLOCATION: ┌────────────────────┬──────────┬───────────┐ │ Component │ Priority │ % Budget │ ├────────────────────┼──────────┼───────────┤ │ System prompt │ High │ 5-10% │ │ Current query │ Critical │ 5-10% │ │ Recent turns │ High │ 20-30% │ │ Retrieved docs │ Medium │ 30-40% │ │ History summary │ Low │ 10-20% │ │ Response reserve │ Critical │ 15-25% │ └────────────────────┴──────────┴───────────┘

Interactive Exercise

Design Context Strategy

You're building a customer support chatbot with a 16K token context window. Conversations can last 50+ turns. How would you manage context to maintain quality throughout long conversations?

Pro Tips
  • Always reserve tokens for the model's response (at least 15%)
  • Put critical instructions at start AND repeat at end
  • Use structured formats (JSON, bullets) for denser information
  • Monitor for context degradation in long conversations

Related Terms