Compression | HyperKit.ai

Definition

Compression refers to techniques that reduce the token count of prompts while preserving essential information. This is crucial for working within context window limits, reducing costs, and improving inference speed.

Compression can be lossy (some information removed) or lossless (all information preserved in fewer tokens), and can be performed at the lexical, semantic, or learned representation level.

Key Concepts

Token reduction: Fewer tokens = lower cost and faster inference
Information density: More meaning per token
Lossy vs lossless: Trade-off between compression ratio and fidelity
Semantic preservation: Keeping meaning while reducing text

Examples

Techniques

Compression Methods Comparison

COMPRESSION TECHNIQUES:

1. LEXICAL COMPRESSION (Simple):
   - Remove filler words and redundancy
   - Abbreviate common phrases
   - Use shorter synonyms

   Before (45 tokens):
   "I would really like you to please help me
   understand how I can go about implementing
   a function that will sort the array in
   ascending order from smallest to largest."

   After (12 tokens):
   "Help implement ascending sort function for array."

   Compression: 73% reduction

2. SEMANTIC COMPRESSION (Summarization):
   - Condense meaning into key points
   - Remove examples, keep concepts
   - Extract essential information

   Before (Long article ~2000 tokens):
   [Full research paper text...]

   After (~200 tokens):
   "Key findings: 1) X improves Y by 40%,
   2) Method works best with Z, 3) Limitations
   include A and B. Main contribution: Novel
   approach to solving problem P."

   Compression: 90% reduction

3. LEARNED COMPRESSION:
   - Soft prompts / prompt tuning
   - Compress into continuous embeddings
   - Model learns optimal compression

   Prompt: "You are a helpful assistant that..."
        → [Learned 5-token embedding]

COMPRESSION RATIOS BY METHOD:
┌────────────────────┬───────────┬──────────────┐
│ Method             │ Reduction │ Info Loss    │
├────────────────────┼───────────┼──────────────┤
│ Lexical cleanup    │ 20-40%    │ Minimal      │
│ Summarization      │ 60-90%    │ Low-Medium   │
│ Selective pruning  │ 40-70%    │ Variable     │
│ Learned compress   │ 80-95%    │ Task-depend  │
└────────────────────┴───────────┴──────────────┘

Implementation

Practical Compression Strategies

PRACTICAL COMPRESSION STRATEGIES:

1. SYSTEM PROMPT COMPRESSION:

   Before (verbose):
   """
   You are a helpful AI assistant. Your goal is
   to help users with their questions. You should
   always be polite and professional. When you
   don't know something, say so. Try to be concise
   in your responses while still being thorough.
   """

   After (compressed):
   """
   Helpful AI. Be polite, professional, concise.
   Admit uncertainty. Answer thoroughly.
   """

   Tokens: 58 → 14 (76% reduction)

2. CONTEXT COMPRESSION FOR RAG:

   Retrieved chunks (verbose):
   """
   Document 1: The company was founded in 2015
   by John Smith. John Smith had previously worked
   at Google. The founding year 2015 was significant
   because the AI boom was starting...

   Document 2: Revenue in 2023 was $50 million,
   which represents a 40% increase from 2022's
   revenue of approximately $35.7 million...
   """

   Compressed:
   """
   - Founded: 2015, by John Smith (ex-Google)
   - Revenue: $50M (2023), +40% YoY
   """

3. CONVERSATION HISTORY COMPRESSION:

   Strategy: Summarize older turns, keep recent

   Turn 1-5: [Summarized: "User asked about Python
              sorting. Discussed bubble sort and
              quicksort trade-offs."]
   Turn 6: [Full: User's current question]
   Turn 7: [Full: Your current response]

COMPRESSION DECISION TREE:

Input too long for context?
├─ Yes → Compression needed
│   ├─ Is full fidelity required?
│   │   ├─ Yes → Chunking + multiple calls
│   │   └─ No → Summarization
│   │       ├─ Task-critical info?
│   │       │   ├─ Yes → Extractive compression
│   │       │   └─ No → Abstractive compression
│   └─ Monitor for quality degradation
└─ No → No compression needed

Interactive Exercise

✎

Compress This Prompt

Compress the following prompt while preserving essential information:

"I would really appreciate it if you could please take a look at this piece of code and let me know if there are any potential issues, bugs, or problems that you can identify. Also, if you have any suggestions for how I might be able to improve the code to make it better, cleaner, or more efficient, I would love to hear them."

Pro Tips

Test compressed prompts - ensure quality doesn't degrade
Prioritize: Keep task-critical info, remove pleasantries
Use bullet points over prose for higher density
For RAG, compress retrieved context, not the query

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms