Advanced Reasoning / Decomposition

Complexity-Based Prompting

Advanced [4/5]
Complexity-based consistency Reasoning length selection

Definition

Complexity-based prompting selects answers based on the complexity of reasoning chains rather than just voting. When generating multiple reasoning paths, it favors responses with longer, more detailed chains of thought, based on the insight that complex problems benefit from more thorough reasoning.

This approach builds on self-consistency but uses reasoning depth as a quality signal.

Key Concepts

  • Reasoning length: Number of steps in chain of thought
  • Complexity as quality: More detailed reasoning often means more thorough
  • Weighted voting: Weight votes by reasoning complexity
  • Threshold selection: Only consider answers above complexity threshold

Examples

Comparison
Self-Consistency vs Complexity-Based
PROBLEM: "A store has 23 apples. 15 are sold in the morning, 8 more arrive, then 6 are sold. How many left?" SELF-CONSISTENCY (majority voting): Sample 1: "23-15+8-6 = 10" → Answer: 10 Sample 2: "23-15 = 8, 8+8 = 16, 16-6 = 10" → Answer: 10 Sample 3: "15+6 = 21 sold, 23-21 = 2" → Answer: 2 (wrong) Sample 4: "23-15+8-6 = 10" → Answer: 10 Sample 5: "Start 23, sell 15→8, add 8→16, sell 6→10" → 10 Majority vote: 10 (4 out of 5) ✓ COMPLEXITY-BASED PROMPTING: Same samples, but weighted by reasoning steps: Sample 1: 1 step → weight 1 Sample 2: 3 steps → weight 3 Sample 3: 2 steps → weight 2 (wrong answer) Sample 4: 1 step → weight 1 Sample 5: 4 steps → weight 4 Weighted votes for "10": 1+3+1+4 = 9 Weighted votes for "2": 2 Answer: 10 (even more confident) ✓ WHY COMPLEXITY HELPS: - Short answers might be lucky guesses - Longer reasoning shows work, catches errors - Complex chains indicate thorough thinking
Implementation
Complexity-Based Selection Algorithm
COMPLEXITY-BASED PROMPTING ALGORITHM: def complexity_based_answer(question, n_samples=10): # Generate multiple reasoning chains responses = [] for _ in range(n_samples): response = llm.generate( f"Think step by step: {question}", temperature=0.7 # Some randomness ) responses.append(response) # Extract answer and measure complexity scored = [] for resp in responses: answer = extract_final_answer(resp) complexity = measure_complexity(resp) scored.append((answer, complexity, resp)) # Method 1: Top-K by complexity, then vote top_k = sorted(scored, key=lambda x: -x[1])[:5] answers = [s[0] for s in top_k] return majority_vote(answers) # Method 2: Weighted voting votes = defaultdict(float) for answer, complexity, _ in scored: votes[answer] += complexity return max(votes, key=votes.get) def measure_complexity(response): """Multiple ways to measure reasoning complexity""" # Simple: count reasoning steps steps = response.count("Step") + response.count("Then") # Token-based: longer = more complex tokens = len(response.split()) # Structure-based: count logical connectives connectives = sum(response.count(w) for w in ["therefore", "because", "since", "so", "thus"]) # Combined score return steps * 2 + tokens * 0.1 + connectives * 3 RESULTS (from paper): ┌─────────────────────┬──────────┬─────────────┐ │ Method │ GSM8K │ MultiArith │ ├─────────────────────┼──────────┼─────────────┤ │ Standard CoT │ 56.5% │ 91.7% │ │ Self-Consistency │ 74.4% │ 94.2% │ │ Complexity-Based │ 78.8% │ 95.3% │ └─────────────────────┴──────────┴─────────────┘ +4.4% improvement over self-consistency on GSM8K!

Interactive Exercise

Score by Complexity

Given these 3 reasoning chains for "What is 15% of 80?", rank them by complexity and pick the best answer:

A: "15% of 80 = 12" (1 step)
B: "15% = 0.15, 0.15 × 80 = 12" (2 steps)
C: "15% means 15/100. 80 × 15 = 1200. 1200 ÷ 100 = 12" (3 steps)

Pro Tips
  • Complexity-based works best for math and logical reasoning
  • Combine with self-consistency: filter by complexity, then vote
  • Watch for verbose but wrong reasoning (complexity ≠ correctness)
  • Set minimum complexity threshold to filter shallow answers

Related Terms