Advanced Reasoning / Decomposition

Skeleton-of-Thought

Advanced [4/5]
SoT Parallel decoding Outline-first generation

Definition

Skeleton-of-Thought (SoT) is a prompting technique that first generates a skeleton outline of the answer, then expands each point in parallel. This dramatically reduces latency by enabling parallel generation of independent sections.

Unlike sequential Chain-of-Thought, SoT can achieve 2x+ speedup while maintaining answer quality for suitable tasks.

Key Concepts

  • Skeleton stage: Generate high-level outline first
  • Point expansion: Expand each skeleton point independently
  • Parallel execution: Expand multiple points simultaneously
  • Latency reduction: Trade compute for speed via parallelism

Examples

Comparison
Sequential vs Skeleton-of-Thought
QUESTION: "Explain the benefits of exercise" TRADITIONAL (Sequential): ├─ Token 1: "Exercise" ├─ Token 2: "provides" ├─ Token 3: "many" ├─ ... (generates everything sequentially) └─ Token N: "conclusion." Time: ████████████████████ 100% All tokens generated one after another. SKELETON-OF-THOUGHT (Parallel): STAGE 1 - Generate Skeleton: "1. Physical health benefits 2. Mental health benefits 3. Social benefits 4. Long-term advantages" Time: ████ 20% STAGE 2 - Expand in Parallel: ┌─────────────────────────────────────────────┐ │ Point 1 │ Point 2 │ Point 3 │ │ "Physical │ "Mental │ "Social │ │ health..." │ health..." │ benefits..." │ │ (parallel) │ (parallel) │ (parallel) │ └─────────────────────────────────────────────┘ Time: ████████ 40% STAGE 3 - Combine: Merge all expanded points into final answer. Time: ██ 10% TOTAL: ██████████████ 70% of sequential time! SPEEDUP: ~1.4-2.4x faster for suitable questions
Implementation
SoT Prompting Pattern
SKELETON-OF-THOUGHT IMPLEMENTATION: # STEP 1: Skeleton Generation Prompt skeleton_prompt = """ Question: {question} Instead of answering directly, first provide a skeleton outline with 3-5 main points. Each point should be a brief phrase that will be expanded later. Format: 1. [Point 1] 2. [Point 2] ... """ # STEP 2: Point Expansion Prompt (run in parallel) expansion_prompt = """ Question: {question} Skeleton point to expand: {point} Provide a detailed 2-3 sentence expansion of just this point. Be specific and informative. """ # STEP 3: Implementation import asyncio async def skeleton_of_thought(question): # Generate skeleton skeleton = await llm.generate( skeleton_prompt.format(question=question) ) points = parse_skeleton(skeleton) # Expand all points in parallel expansions = await asyncio.gather(*[ llm.generate(expansion_prompt.format( question=question, point=point )) for point in points ]) # Combine into final answer return combine_expansions(points, expansions) WHEN TO USE SoT: ✓ List-based questions ("What are the benefits of...") ✓ Multi-aspect explanations ✓ Comparative analyses ✓ Tutorial/how-to content WHEN NOT TO USE: ✗ Math problems (steps depend on each other) ✗ Logical reasoning (sequential dependency) ✗ Short factual questions ✗ Creative writing (needs flow)

Interactive Exercise

Create a Skeleton

Question: "What should I consider when choosing a programming language for a new project?"

Create a skeleton outline with 4-5 points that could be expanded in parallel.

Pro Tips
  • Best speedup comes from 4-8 parallel points
  • Ensure skeleton points are truly independent
  • Add a final "synthesis" step for coherence if needed
  • Not all questions benefit - use a router to detect suitable queries

Related Terms