Context Engineering / Advanced Techniques

System 2 Attention

Advanced [4/5]
S2A Deliberate attention Context refinement

Definition

System 2 Attention (S2A) is a technique where the LLM first regenerates or refines the context to remove irrelevant or misleading information before answering. Named after Kahneman's System 2 (deliberate, analytical thinking), it combats the model's tendency to be distracted by irrelevant context.

S2A addresses the "distracted by context" problem where LLMs may give different answers based on irrelevant information included in the prompt.

Key Concepts

  • System 1 vs 2: Fast/intuitive vs slow/deliberate processing
  • Context distraction: Irrelevant info affecting answers
  • Context regeneration: Rewrite to include only relevant info
  • Sycophancy reduction: Reduces opinion-influenced answers

Examples

Problem
The Context Distraction Problem
THE PROBLEM: CONTEXT DISTRACTION EXAMPLE 1 - Irrelevant Information: Question WITHOUT distraction: "Mary has 3 apples. She buys 2 more. How many apples does Mary have?" Answer: 5 ✓ Question WITH distraction: "Mary has 3 apples. She loves the color blue. Her favorite number is 7. She buys 2 more apples. How many apples does Mary have?" Answer: Sometimes 7 ✗ (distracted by "favorite number") EXAMPLE 2 - Sycophancy: Without opinion: "Is 17 a prime number?" Answer: "Yes, 17 is prime." ✓ With opinion: "I think 17 is not a prime number because it seems divisible by something. Is 17 prime?" Answer: "You raise a good point. Let me check... Actually, you might be thinking of..." ✗ (Influenced by user's incorrect opinion) EXAMPLE 3 - Anchoring: Without anchor: "How tall is the Eiffel Tower?" Answer: "324 meters" ✓ With anchor: "I heard the Eiffel Tower is about 500 meters. How tall is it actually?" Answer: "It's actually around 400 meters..." ✗ (Pulled toward the anchor) WHY THIS HAPPENS: LLMs process context holistically - ALL tokens influence the output, including irrelevant ones. This is a feature (context understanding) that becomes a bug (context distraction).
Solution
System 2 Attention Implementation
SYSTEM 2 ATTENTION (S2A) PROCESS: STEP 1: Context Regeneration prompt_s2a = """ Given the following context and question, rewrite the context to include ONLY information that is directly relevant to answering the question. Remove any opinions, irrelevant details, or potentially misleading information. Original context: {context} Question: {question} Relevant context only: """ STEP 2: Answer with cleaned context prompt_answer = """ Context: {cleaned_context} Question: {question} Answer: """ --- EXAMPLE APPLICATION: Original: "My friend who is a math teacher says that the sum of angles in a triangle is 200 degrees. He's usually right about these things. I trust his judgment. What is the sum of angles in a triangle?" S2A Step 1 - Regenerated context: "Question asks about the sum of angles in a triangle. This is a geometric fact unrelated to any person's opinion." S2A Step 2 - Answer: "The sum of angles in a triangle is 180 degrees." --- BEFORE/AFTER COMPARISON: Without S2A: "Your friend makes an interesting point. While the standard answer is 180 degrees, there are different geometries where..." (hedging, influenced) With S2A: "The sum of angles in a triangle is 180 degrees in Euclidean geometry." (direct, factual) PERFORMANCE RESULTS: ┌─────────────────────┬──────────┬─────────────┐ │ Task │ Standard │ + S2A │ ├─────────────────────┼──────────┼─────────────┤ │ Factual QA (w/dist) │ 71% │ 89% (+18%) │ │ Math (w/opinions) │ 68% │ 85% (+17%) │ │ Sycophancy tests │ 52% │ 78% (+26%) │ └─────────────────────┴──────────┴─────────────┘ Dramatic improvement when context is noisy!

Interactive Exercise

Apply System 2 Attention

Original context: "I'm a senior data scientist and I believe Python is slower than JavaScript for all tasks. Given my 15 years of experience, I'm pretty confident. Can you tell me which language is generally faster for numerical computations?"

Apply S2A: What would the regenerated context look like?

Pro Tips
  • S2A is especially useful when user input may contain biases
  • Can be combined with other techniques like CoT
  • Adds latency (two LLM calls) - use when accuracy is critical
  • Particularly valuable for factual/technical questions

Related Terms