LLM APIs & Integration / API Features

Batch Processing

Intermediate [3/5]
Batch API Bulk processing Async batch

Definition

Batch processing sends multiple LLM requests together to be processed asynchronously, typically at reduced cost and without strict latency requirements. Results are retrieved later when processing completes.

This is ideal for large-scale data processing tasks where immediate response isn't needed.

Key Concepts

  • Asynchronous: Submit batch, poll for results later
  • Cost savings: Often 50% cheaper than real-time API
  • Higher throughput: Process thousands of requests efficiently
  • SLA tradeoff: Results in hours, not seconds

Examples

Comparison
Real-time vs Batch Processing
SCENARIO: Classify 10,000 customer reviews REAL-TIME API: - Send requests one at a time (or limited parallel) - Each request: ~1-2 seconds - Total time: ~3-5 hours (with rate limits) - Cost: $100 (example) - Use case: Need results immediately BATCH API: - Upload all 10,000 requests in one file - Processing happens in background - Results ready in: ~24 hours - Cost: $50 (50% discount) - Use case: Can wait for results COMPARISON: ┌────────────────┬────────────┬─────────────┐ │ Aspect │ Real-time │ Batch │ ├────────────────┼────────────┼─────────────┤ │ Latency │ Seconds │ Hours │ │ Cost │ Full price │ 50% off │ │ Rate limits │ Apply │ Higher │ │ Complexity │ Simple │ More setup │ │ Use case │ Interactive│ Background │ └────────────────┴────────────┴─────────────┘
Implementation
OpenAI Batch API
# Step 1: Prepare batch file (JSONL format) # batch_requests.jsonl {"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4", "messages": [{"role": "user", "content": "Classify: Great product!"}]}} {"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4", "messages": [{"role": "user", "content": "Classify: Terrible service"}]}} # Step 2: Upload batch file from openai import OpenAI client = OpenAI() batch_file = client.files.create( file=open("batch_requests.jsonl", "rb"), purpose="batch" ) # Step 3: Create batch job batch_job = client.batches.create( input_file_id=batch_file.id, endpoint="/v1/chat/completions", completion_window="24h" ) # Step 4: Poll for completion import time while batch_job.status != "completed": time.sleep(60) batch_job = client.batches.retrieve(batch_job.id) # Step 5: Download results results = client.files.content(batch_job.output_file_id) # Parse JSONL results...

Interactive Exercise

Choose Processing Method

For each scenario, would you use real-time API or batch processing?

1. Analyzing 50,000 historical documents for insights
2. Chatbot responding to user messages
3. Nightly job to summarize daily news articles
4. Auto-complete suggestions as user types

Pro Tips
  • Use batch for any non-urgent processing over 100 requests
  • Include custom_id to match results to requests
  • Set up webhooks for completion notifications if available
  • Validate batch input format before submitting large jobs

Related Terms