Batch Processing | HyperKit.ai

Definition

Batch processing sends multiple LLM requests together to be processed asynchronously, typically at reduced cost and without strict latency requirements. Results are retrieved later when processing completes.

This is ideal for large-scale data processing tasks where immediate response isn't needed.

Key Concepts

Asynchronous: Submit batch, poll for results later
Cost savings: Often 50% cheaper than real-time API
Higher throughput: Process thousands of requests efficiently
SLA tradeoff: Results in hours, not seconds

Examples

Comparison

Real-time vs Batch Processing

SCENARIO: Classify 10,000 customer reviews

REAL-TIME API:
- Send requests one at a time (or limited parallel)
- Each request: ~1-2 seconds
- Total time: ~3-5 hours (with rate limits)
- Cost: $100 (example)
- Use case: Need results immediately

BATCH API:
- Upload all 10,000 requests in one file
- Processing happens in background
- Results ready in: ~24 hours
- Cost: $50 (50% discount)
- Use case: Can wait for results

COMPARISON:
┌────────────────┬────────────┬─────────────┐
│ Aspect         │ Real-time  │ Batch       │
├────────────────┼────────────┼─────────────┤
│ Latency        │ Seconds    │ Hours       │
│ Cost           │ Full price │ 50% off     │
│ Rate limits    │ Apply      │ Higher      │
│ Complexity     │ Simple     │ More setup  │
│ Use case       │ Interactive│ Background  │
└────────────────┴────────────┴─────────────┘

Implementation

OpenAI Batch API

# Step 1: Prepare batch file (JSONL format)
# batch_requests.jsonl
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions",
 "body": {"model": "gpt-4", "messages": [{"role": "user", "content": "Classify: Great product!"}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions",
 "body": {"model": "gpt-4", "messages": [{"role": "user", "content": "Classify: Terrible service"}]}}

# Step 2: Upload batch file
from openai import OpenAI
client = OpenAI()

batch_file = client.files.create(
    file=open("batch_requests.jsonl", "rb"),
    purpose="batch"
)

# Step 3: Create batch job
batch_job = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

# Step 4: Poll for completion
import time
while batch_job.status != "completed":
    time.sleep(60)
    batch_job = client.batches.retrieve(batch_job.id)

# Step 5: Download results
results = client.files.content(batch_job.output_file_id)
# Parse JSONL results...

Interactive Exercise

✎

Choose Processing Method

For each scenario, would you use real-time API or batch processing?

1. Analyzing 50,000 historical documents for insights
2. Chatbot responding to user messages
3. Nightly job to summarize daily news articles
4. Auto-complete suggestions as user types

Pro Tips

Use batch for any non-urgent processing over 100 requests
Include custom_id to match results to requests
Set up webhooks for completion notifications if available
Validate batch input format before submitting large jobs

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms