Context Engineering / Information Processing

Compression

Intermediate [3/5]
Context compression Prompt compression Token reduction

Definition

Compression refers to techniques that reduce the token count of prompts while preserving essential information. This is crucial for working within context window limits, reducing costs, and improving inference speed.

Compression can be lossy (some information removed) or lossless (all information preserved in fewer tokens), and can be performed at the lexical, semantic, or learned representation level.

Key Concepts

  • Token reduction: Fewer tokens = lower cost and faster inference
  • Information density: More meaning per token
  • Lossy vs lossless: Trade-off between compression ratio and fidelity
  • Semantic preservation: Keeping meaning while reducing text

Examples

Techniques
Compression Methods Comparison
COMPRESSION TECHNIQUES: 1. LEXICAL COMPRESSION (Simple): - Remove filler words and redundancy - Abbreviate common phrases - Use shorter synonyms Before (45 tokens): "I would really like you to please help me understand how I can go about implementing a function that will sort the array in ascending order from smallest to largest." After (12 tokens): "Help implement ascending sort function for array." Compression: 73% reduction 2. SEMANTIC COMPRESSION (Summarization): - Condense meaning into key points - Remove examples, keep concepts - Extract essential information Before (Long article ~2000 tokens): [Full research paper text...] After (~200 tokens): "Key findings: 1) X improves Y by 40%, 2) Method works best with Z, 3) Limitations include A and B. Main contribution: Novel approach to solving problem P." Compression: 90% reduction 3. LEARNED COMPRESSION: - Soft prompts / prompt tuning - Compress into continuous embeddings - Model learns optimal compression Prompt: "You are a helpful assistant that..." → [Learned 5-token embedding] COMPRESSION RATIOS BY METHOD: ┌────────────────────┬───────────┬──────────────┐ │ Method │ Reduction │ Info Loss │ ├────────────────────┼───────────┼──────────────┤ │ Lexical cleanup │ 20-40% │ Minimal │ │ Summarization │ 60-90% │ Low-Medium │ │ Selective pruning │ 40-70% │ Variable │ │ Learned compress │ 80-95% │ Task-depend │ └────────────────────┴───────────┴──────────────┘
Implementation
Practical Compression Strategies
PRACTICAL COMPRESSION STRATEGIES: 1. SYSTEM PROMPT COMPRESSION: Before (verbose): """ You are a helpful AI assistant. Your goal is to help users with their questions. You should always be polite and professional. When you don't know something, say so. Try to be concise in your responses while still being thorough. """ After (compressed): """ Helpful AI. Be polite, professional, concise. Admit uncertainty. Answer thoroughly. """ Tokens: 58 → 14 (76% reduction) 2. CONTEXT COMPRESSION FOR RAG: Retrieved chunks (verbose): """ Document 1: The company was founded in 2015 by John Smith. John Smith had previously worked at Google. The founding year 2015 was significant because the AI boom was starting... Document 2: Revenue in 2023 was $50 million, which represents a 40% increase from 2022's revenue of approximately $35.7 million... """ Compressed: """ - Founded: 2015, by John Smith (ex-Google) - Revenue: $50M (2023), +40% YoY """ 3. CONVERSATION HISTORY COMPRESSION: Strategy: Summarize older turns, keep recent Turn 1-5: [Summarized: "User asked about Python sorting. Discussed bubble sort and quicksort trade-offs."] Turn 6: [Full: User's current question] Turn 7: [Full: Your current response] COMPRESSION DECISION TREE: Input too long for context? ├─ Yes → Compression needed │ ├─ Is full fidelity required? │ │ ├─ Yes → Chunking + multiple calls │ │ └─ No → Summarization │ │ ├─ Task-critical info? │ │ │ ├─ Yes → Extractive compression │ │ │ └─ No → Abstractive compression │ └─ Monitor for quality degradation └─ No → No compression needed

Interactive Exercise

Compress This Prompt

Compress the following prompt while preserving essential information:

"I would really appreciate it if you could please take a look at this piece of code and let me know if there are any potential issues, bugs, or problems that you can identify. Also, if you have any suggestions for how I might be able to improve the code to make it better, cleaner, or more efficient, I would love to hear them."

Pro Tips
  • Test compressed prompts - ensure quality doesn't degrade
  • Prioritize: Keep task-critical info, remove pleasantries
  • Use bullet points over prose for higher density
  • For RAG, compress retrieved context, not the query

Related Terms