Skeleton-of-Thought | HyperKit.ai

Definition

Skeleton-of-Thought (SoT) is a prompting technique that first generates a skeleton outline of the answer, then expands each point in parallel. This dramatically reduces latency by enabling parallel generation of independent sections.

Unlike sequential Chain-of-Thought, SoT can achieve 2x+ speedup while maintaining answer quality for suitable tasks.

Key Concepts

Skeleton stage: Generate high-level outline first
Point expansion: Expand each skeleton point independently
Parallel execution: Expand multiple points simultaneously
Latency reduction: Trade compute for speed via parallelism

Examples

Comparison

Sequential vs Skeleton-of-Thought

QUESTION: "Explain the benefits of exercise"

TRADITIONAL (Sequential):
├─ Token 1: "Exercise"
├─ Token 2: "provides"
├─ Token 3: "many"
├─ ... (generates everything sequentially)
└─ Token N: "conclusion."

Time: ████████████████████ 100%
All tokens generated one after another.

SKELETON-OF-THOUGHT (Parallel):

STAGE 1 - Generate Skeleton:
"1. Physical health benefits
 2. Mental health benefits
 3. Social benefits
 4. Long-term advantages"

Time: ████ 20%

STAGE 2 - Expand in Parallel:
┌─────────────────────────────────────────────┐
│ Point 1      │ Point 2      │ Point 3      │
│ "Physical    │ "Mental      │ "Social      │
│ health..."   │ health..."   │ benefits..." │
│ (parallel)   │ (parallel)   │ (parallel)   │
└─────────────────────────────────────────────┘

Time: ████████ 40%

STAGE 3 - Combine:
Merge all expanded points into final answer.

Time: ██ 10%

TOTAL: ██████████████ 70% of sequential time!

SPEEDUP: ~1.4-2.4x faster for suitable questions

Implementation

SoT Prompting Pattern

SKELETON-OF-THOUGHT IMPLEMENTATION:

# STEP 1: Skeleton Generation Prompt
skeleton_prompt = """
Question: {question}

Instead of answering directly, first provide a
skeleton outline with 3-5 main points. Each point
should be a brief phrase that will be expanded later.

Format:
1. [Point 1]
2. [Point 2]
...
"""

# STEP 2: Point Expansion Prompt (run in parallel)
expansion_prompt = """
Question: {question}

Skeleton point to expand: {point}

Provide a detailed 2-3 sentence expansion of just
this point. Be specific and informative.
"""

# STEP 3: Implementation
import asyncio

async def skeleton_of_thought(question):
    # Generate skeleton
    skeleton = await llm.generate(
        skeleton_prompt.format(question=question)
    )
    points = parse_skeleton(skeleton)

    # Expand all points in parallel
    expansions = await asyncio.gather(*[
        llm.generate(expansion_prompt.format(
            question=question,
            point=point
        ))
        for point in points
    ])

    # Combine into final answer
    return combine_expansions(points, expansions)

WHEN TO USE SoT:
✓ List-based questions ("What are the benefits of...")
✓ Multi-aspect explanations
✓ Comparative analyses
✓ Tutorial/how-to content

WHEN NOT TO USE:
✗ Math problems (steps depend on each other)
✗ Logical reasoning (sequential dependency)
✗ Short factual questions
✗ Creative writing (needs flow)

Interactive Exercise

✎

Create a Skeleton

Question: "What should I consider when choosing a programming language for a new project?"

Create a skeleton outline with 4-5 points that could be expanded in parallel.

Pro Tips

Best speedup comes from 4-8 parallel points
Ensure skeleton points are truly independent
Add a final "synthesis" step for coherence if needed
Not all questions benefit - use a router to detect suitable queries

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms