Complexity-Based Prompting

Definition

Complexity-based prompting selects answers based on the complexity of reasoning chains rather than just voting. When generating multiple reasoning paths, it favors responses with longer, more detailed chains of thought, based on the insight that complex problems benefit from more thorough reasoning.

This approach builds on self-consistency but uses reasoning depth as a quality signal.

Key Concepts

Reasoning length: Number of steps in chain of thought
Complexity as quality: More detailed reasoning often means more thorough
Weighted voting: Weight votes by reasoning complexity
Threshold selection: Only consider answers above complexity threshold

Examples

Comparison

Self-Consistency vs Complexity-Based

PROBLEM: "A store has 23 apples. 15 are sold in the
morning, 8 more arrive, then 6 are sold. How many left?"

SELF-CONSISTENCY (majority voting):
Sample 1: "23-15+8-6 = 10" → Answer: 10
Sample 2: "23-15 = 8, 8+8 = 16, 16-6 = 10" → Answer: 10
Sample 3: "15+6 = 21 sold, 23-21 = 2" → Answer: 2 (wrong)
Sample 4: "23-15+8-6 = 10" → Answer: 10
Sample 5: "Start 23, sell 15→8, add 8→16, sell 6→10" → 10

Majority vote: 10 (4 out of 5) ✓

COMPLEXITY-BASED PROMPTING:
Same samples, but weighted by reasoning steps:

Sample 1: 1 step  → weight 1
Sample 2: 3 steps → weight 3
Sample 3: 2 steps → weight 2 (wrong answer)
Sample 4: 1 step  → weight 1
Sample 5: 4 steps → weight 4

Weighted votes for "10": 1+3+1+4 = 9
Weighted votes for "2":  2

Answer: 10 (even more confident) ✓

WHY COMPLEXITY HELPS:
- Short answers might be lucky guesses
- Longer reasoning shows work, catches errors
- Complex chains indicate thorough thinking

Implementation

Complexity-Based Selection Algorithm

COMPLEXITY-BASED PROMPTING ALGORITHM:

def complexity_based_answer(question, n_samples=10):
    # Generate multiple reasoning chains
    responses = []
    for _ in range(n_samples):
        response = llm.generate(
            f"Think step by step: {question}",
            temperature=0.7  # Some randomness
        )
        responses.append(response)

    # Extract answer and measure complexity
    scored = []
    for resp in responses:
        answer = extract_final_answer(resp)
        complexity = measure_complexity(resp)
        scored.append((answer, complexity, resp))

    # Method 1: Top-K by complexity, then vote
    top_k = sorted(scored, key=lambda x: -x[1])[:5]
    answers = [s[0] for s in top_k]
    return majority_vote(answers)

    # Method 2: Weighted voting
    votes = defaultdict(float)
    for answer, complexity, _ in scored:
        votes[answer] += complexity
    return max(votes, key=votes.get)

def measure_complexity(response):
    """Multiple ways to measure reasoning complexity"""

    # Simple: count reasoning steps
    steps = response.count("Step") + response.count("Then")

    # Token-based: longer = more complex
    tokens = len(response.split())

    # Structure-based: count logical connectives
    connectives = sum(response.count(w) for w in
        ["therefore", "because", "since", "so", "thus"])

    # Combined score
    return steps * 2 + tokens * 0.1 + connectives * 3

RESULTS (from paper):
┌─────────────────────┬──────────┬─────────────┐
│ Method              │ GSM8K    │ MultiArith  │
├─────────────────────┼──────────┼─────────────┤
│ Standard CoT        │ 56.5%    │ 91.7%       │
│ Self-Consistency    │ 74.4%    │ 94.2%       │
│ Complexity-Based    │ 78.8%    │ 95.3%       │
└─────────────────────┴──────────┴─────────────┘

+4.4% improvement over self-consistency on GSM8K!

Interactive Exercise

✎

Score by Complexity

Given these 3 reasoning chains for "What is 15% of 80?", rank them by complexity and pick the best answer:

A: "15% of 80 = 12" (1 step)
B: "15% = 0.15, 0.15 × 80 = 12" (2 steps)
C: "15% means 15/100. 80 × 15 = 1200. 1200 ÷ 100 = 12" (3 steps)

Pro Tips

Complexity-based works best for math and logical reasoning
Combine with self-consistency: filter by complexity, then vote
Watch for verbose but wrong reasoning (complexity ≠ correctness)
Set minimum complexity threshold to filter shallow answers

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms