Batch API

Overview

The Batch API enables asynchronous processing of large volumes of requests at reduced costs. It’s ideal for:

Bulk content generation
Large-scale data analysis
Embedding generation for datasets
Any task that doesn’t require immediate responses

Endpoints

File Management

POST   /v1/files
GET    /v1/files/{file_id}
GET    /v1/files/{file_id}/content
DELETE /v1/files/{file_id}

Batch Operations

POST   /v1/batches
GET    /v1/batches/{batch_id}
GET    /v1/batches
POST   /v1/batches/{batch_id}/cancel

Authentication

Authorization: Bearer <API_KEY>

File Upload

Upload JSONL files containing batch requests.

Request Format

Endpoint: POST /v1/files Headers:

Authorization: Bearer YOUR_API_KEY (required)
Content-Type: multipart/form-data (required)

Form Data:

Field	Type	Required	Description
`file`	file	Yes	JSONL file containing requests (max 100MB)
`purpose`	string	Yes	Must be `"batch"`

Response Format

{
  "id": "file-abc123",
  "object": "file",
  "bytes": 2048,
  "created_at": 1699123456,
  "filename": "batch_requests.jsonl",
  "purpose": "batch"
}

JSONL Request Format

Each line in the JSONL file must be a valid JSON object:

{
  "custom_id": "request-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7
  }
}

Request Fields

Field	Type	Required	Description
`custom_id`	string	Yes	Unique identifier for tracking
`method`	string	Yes	Must be `"POST"`
`url`	string	Yes	Target endpoint (e.g., `/v1/chat/completions`)
`body`	object	Yes	Request body for the endpoint

Create Batch

Submit a file for batch processing.

Request Format

Endpoint: POST /v1/batches Request Body:

{
  "input_file_id": "file-abc123",
  "endpoint": "/v1/chat/completions",
  "completion_window": "24h",
  "metadata": {
    "customer_id": "12345",
    "batch_name": "product_descriptions"
  }
}

Parameters

Field	Type	Required	Description
`input_file_id`	string	Yes	ID of uploaded JSONL file
`endpoint`	string	Yes	Target API endpoint
`completion_window`	string	No	Processing window (default: `"24h"`)
`metadata`	object	No	Key-value pairs (max 16)

Response Format

{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file-abc123",
  "completion_window": "24h",
  "status": "validating",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1699123456,
  "in_progress_at": null,
  "expires_at": 1699209856,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "metadata": {
    "customer_id": "12345",
    "batch_name": "product_descriptions"
  }
}

Batch Status

Status Values

Status	Description
`validating`	Initial validation of batch request
`failed`	Batch failed to process
`in_progress`	Currently processing requests
`finalizing`	Preparing result files
`completed`	Successfully processed
`expired`	Exceeded 24-hour window
`cancelling`	Cancellation in progress
`cancelled`	Successfully cancelled

Retrieve Batch

Get the current status of a batch. Endpoint: GET /v1/batches/{batch_id} Returns the same format as batch creation response with updated status and counts.

List Batches

List all batches with pagination support. Endpoint: GET /v1/batches

Query Parameters

Parameter	Type	Required	Description
`after`	string	No	Cursor for pagination
`limit`	integer	No	Results per page (default: 20, max: 100)

Response Format

{
  "object": "list",
  "data": [
    {
      "id": "batch_abc123",
      "object": "batch",
      "status": "completed",
      ...
    }
  ],
  "first_id": "batch_abc123",
  "last_id": "batch_xyz789",
  "has_more": false
}

Cancel Batch

Cancel an in-progress batch. Endpoint: POST /v1/batches/{batch_id}/cancel Returns the batch object with updated cancellation timestamps.

Output Format

Completed batches produce JSONL output files where each line contains:

{
  "id": "batch_req_123",
  "custom_id": "request-1",
  "response": {
    "status_code": 200,
    "request_id": "req_abc123",
    "body": {
      "id": "chatcmpl-123",
      "object": "chat.completion",
      "created": 1699123456,
      "model": "gpt-3.5-turbo",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "The capital of France is Paris."
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 20,
        "completion_tokens": 8,
        "total_tokens": 28
      }
    }
  },
  "error": null
}

For failed requests:

{
  "id": "batch_req_124",
  "custom_id": "request-2",
  "response": null,
  "error": {
    "code": "invalid_request_error",
    "message": "Model 'invalid-model' not found"
  }
}

Usage Examples

Complete Workflow

# 1. Prepare JSONL file
cat > requests.jsonl << EOF
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "How are you?"}]}}
EOF

# 2. Upload file
FILE_ID=$(curl -X POST http://localhost:3000/v1/files \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F purpose="batch" \
  -F file="@requests.jsonl" \
  | jq -r '.id')

# 3. Create batch
BATCH_ID=$(curl -X POST http://localhost:3000/v1/batches \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "'$FILE_ID'",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }' \
  | jq -r '.id')

# 4. Check status
curl http://localhost:3000/v1/batches/$BATCH_ID \
  -H "Authorization: Bearer YOUR_API_KEY" \
  | jq '.status'

# 5. Download results when completed
OUTPUT_FILE_ID=$(curl http://localhost:3000/v1/batches/$BATCH_ID \
  -H "Authorization: Bearer YOUR_API_KEY" \
  | jq -r '.output_file_id')

curl http://localhost:3000/v1/files/$OUTPUT_FILE_ID/content \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -o results.jsonl

Python Example

import requests
import json
import time

API_KEY = "YOUR_API_KEY"
BASE_URL = "http://localhost:3000"
headers = {"Authorization": f"Bearer {API_KEY}"}

# 1. Prepare requests
requests_data = [
    {
        "custom_id": f"req-{i}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-3.5-turbo",
            "messages": [
                {"role": "user", "content": f"Tell me fact #{i} about Python"}
            ]
        }
    }
    for i in range(1, 101)
]

# 2. Write JSONL file
with open("batch_requests.jsonl", "w") as f:
    for req in requests_data:
        f.write(json.dumps(req) + "\n")

# 3. Upload file
with open("batch_requests.jsonl", "rb") as f:
    files = {"file": f}
    data = {"purpose": "batch"}
    response = requests.post(
        f"{BASE_URL}/v1/files",
        headers=headers,
        files=files,
        data=data
    )
    file_id = response.json()["id"]

# 4. Create batch
batch_data = {
    "input_file_id": file_id,
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h",
    "metadata": {
        "project": "python_facts",
        "version": "1.0"
    }
}

response = requests.post(
    f"{BASE_URL}/v1/batches",
    headers={**headers, "Content-Type": "application/json"},
    json=batch_data
)
batch_id = response.json()["id"]

# 5. Poll for completion
while True:
    response = requests.get(f"{BASE_URL}/v1/batches/{batch_id}", headers=headers)
    batch = response.json()

    print(f"Status: {batch['status']}")
    print(f"Progress: {batch['request_counts']['completed']}/{batch['request_counts']['total']}")

    if batch["status"] in ["completed", "failed", "expired", "cancelled"]:
        break

    time.sleep(30)  # Poll every 30 seconds

# 6. Download results
if batch["status"] == "completed":
    output_file_id = batch["output_file_id"]
    response = requests.get(
        f"{BASE_URL}/v1/files/{output_file_id}/content",
        headers=headers
    )

    # Process results
    for line in response.text.strip().split("\n"):
        result = json.loads(line)
        custom_id = result["custom_id"]
        if result["response"]:
            content = result["response"]["body"]["choices"][0]["message"]["content"]
            print(f"{custom_id}: {content[:100]}...")
        else:
            print(f"{custom_id}: Error - {result['error']['message']}")

# 7. Clean up
requests.delete(f"{BASE_URL}/v1/files/{file_id}", headers=headers)
if batch.get("output_file_id"):
    requests.delete(f"{BASE_URL}/v1/files/{output_file_id}", headers=headers)

JavaScript Example

const fs = require('fs').promises;
const fetch = require('node-fetch');
const FormData = require('form-data');

const API_KEY = 'YOUR_API_KEY';
const BASE_URL = 'http://localhost:3000';

async function processBatch() {
  // 1. Prepare requests
  const requests = Array.from({ length: 50 }, (_, i) => ({
    custom_id: `req-${i + 1}`,
    method: 'POST',
    url: '/v1/chat/completions',
    body: {
      model: 'gpt-3.5-turbo',
      messages: [
        { role: 'user', content: `Generate a product description for item #${i + 1}` }
      ]
    }
  }));

  // 2. Write JSONL file
  const jsonlContent = requests.map(req => JSON.stringify(req)).join('\n');
  await fs.writeFile('batch_requests.jsonl', jsonlContent);

  // 3. Upload file
  const formData = new FormData();
  formData.append('purpose', 'batch');
  formData.append('file', await fs.readFile('batch_requests.jsonl'), 'batch_requests.jsonl');

  const uploadResponse = await fetch(`${BASE_URL}/v1/files`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      ...formData.getHeaders()
    },
    body: formData
  });

  const { id: fileId } = await uploadResponse.json();

  // 4. Create batch
  const batchResponse = await fetch(`${BASE_URL}/v1/batches`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      input_file_id: fileId,
      endpoint: '/v1/chat/completions',
      completion_window: '24h'
    })
  });

  const { id: batchId } = await batchResponse.json();

  // 5. Poll for completion
  let batch;
  do {
    await new Promise(resolve => setTimeout(resolve, 30000)); // Wait 30 seconds

    const statusResponse = await fetch(`${BASE_URL}/v1/batches/${batchId}`, {
      headers: { 'Authorization': `Bearer ${API_KEY}` }
    });

    batch = await statusResponse.json();
    console.log(`Status: ${batch.status}, Progress: ${batch.request_counts.completed}/${batch.request_counts.total}`);
  } while (!['completed', 'failed', 'expired', 'cancelled'].includes(batch.status));

  // 6. Process results
  if (batch.status === 'completed') {
    const resultsResponse = await fetch(
      `${BASE_URL}/v1/files/${batch.output_file_id}/content`,
      { headers: { 'Authorization': `Bearer ${API_KEY}` } }
    );

    const results = await resultsResponse.text();
    const lines = results.trim().split('\n');

    for (const line of lines) {
      const result = JSON.parse(line);
      console.log(`${result.custom_id}: ${result.response ? 'Success' : 'Failed'}`);
    }
  }
}

processBatch().catch(console.error);

Best Practices

Batch Size: Keep batches under 50,000 requests for optimal processing
File Size: Ensure JSONL files are under 100MB
Custom IDs: Use meaningful identifiers for easy tracking
Validation: Validate JSONL format before uploading
Polling: Check status every 30-60 seconds to avoid rate limits
Error Handling: Process partial failures gracefully
Cleanup: Delete processed files to manage storage
Metadata: Use metadata fields for additional context and filtering

Limitations

Maximum requests per batch: 50,000
Maximum file size: 100MB
Completion window: 24 hours
Metadata entries: Maximum 16 key-value pairs
Supported endpoints: /v1/chat/completions, /v1/embeddings, and other compatible endpoints

Error Handling

The batch API automatically retries transient failures. Failed requests are included in the output file with error details, allowing you to:

Identify specific failures
Retry failed requests
Adjust parameters based on error messages
Track success rates

Supported providers

OpenAI

Native batch API support with cost savings for large-scale processing.

Anthropic

Batch processing for Claude models with automatic retry handling.

Together.AI

Efficient batch inference for open-source models at scale.

Self-Hosting

Development

OpenAI-Compatible API

Features

Cluster Setup

Overview

Endpoints

File Management

Batch Operations

Authentication

File Upload

Request Format

Response Format

JSONL Request Format

Request Fields

Create Batch

Request Format

Parameters

Response Format

Batch Status

Status Values

Retrieve Batch

List Batches

Query Parameters

Response Format

Cancel Batch

Output Format

Usage Examples

Complete Workflow

Python Example

JavaScript Example

Best Practices

Limitations

Error Handling

Supported providers

OpenAI

Anthropic

Together.AI

Self-Hosting

Development

OpenAI-Compatible API

Features

Cluster Setup

​Overview

​Endpoints

​File Management

​Batch Operations

​Authentication

​File Upload

​Request Format

​Response Format

​JSONL Request Format

​Request Fields

​Create Batch

​Request Format

​Parameters

​Response Format

​Batch Status

​Status Values

​Retrieve Batch

​List Batches

​Query Parameters

​Response Format

​Cancel Batch

​Output Format

​Usage Examples

​Complete Workflow

​Python Example

​JavaScript Example

​Best Practices

​Limitations

​Error Handling

​Supported providers

OpenAI

Anthropic

Together.AI

Overview

Endpoints

File Management

Batch Operations

Authentication

File Upload

Request Format

Response Format

JSONL Request Format

Request Fields

Create Batch

Request Format

Parameters

Response Format

Batch Status

Status Values

Retrieve Batch

List Batches

Query Parameters

Response Format

Cancel Batch

Output Format

Usage Examples

Complete Workflow

Python Example

JavaScript Example

Best Practices

Limitations

Error Handling

Supported providers