Skip to main content

Overview

The Responses API provides a next-generation interface for complex AI interactions, supporting:
  • Prompt-based execution: Execute versioned prompt templates with variable substitution
  • MCP tool integration: Access Model Context Protocol tools for extended functionality
  • Structured outputs: JSON schema-validated responses for reliable data extraction
  • Array-based outputs: Multiple output types (messages, tool calls, reasoning, MCP tool lists)
  • Multi-turn conversations with context preservation
  • Parallel tool/function calling
  • Multimodal inputs (text, image, audio)
  • Reasoning model capabilities
  • Streaming responses

Endpoints

POST   /v1/responses
GET    /v1/responses/{response_id}
DELETE /v1/responses/{response_id}
POST   /v1/responses/{response_id}/cancel
GET    /v1/responses/{response_id}/input_items

Authentication

Authorization: Bearer <API_KEY>

Create Response

Generate AI responses with advanced conversational features.

Request Format

Endpoint: POST /v1/responses Headers:
  • Authorization: Bearer YOUR_API_KEY (required)
  • Content-Type: application/json (required)
Request Body:
{
  "model": "gpt-4o",
  "input": "Explain quantum computing",
  "previous_response_id": "resp_abc123",
  "prompt": {
    "id": "prompt_quantum_explanation",
    "variables": {
      "topic": "quantum computing",
      "difficulty": "beginner"
    },
    "version": "1"
  },
  "instructions": "You are a helpful physics tutor",
  "modalities": ["text"],
  "reasoning": true,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "calculate_quantum_state",
        "description": "Calculate quantum state probabilities",
        "parameters": {
          "type": "object",
          "properties": {
            "qubits": {"type": "integer"},
            "state": {"type": "string"}
          },
          "required": ["qubits", "state"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "temperature": 0.7,
  "max_tokens": 1500,
  "stream": false,
  "metadata": {
    "user_id": "user123",
    "session": "quantum_tutorial"
  }
}

Parameters

FieldTypeRequiredDescription
modelstringNoModel identifier
promptobjectNoPrompt template parameters
inputstring/arrayNoText or multimodal content
previous_response_idstringNoID for conversation continuity
instructionsstringNoSystem instructions
modalitiesarrayNoOutput types: ["text"], ["text", "audio"]
reasoningbooleanNoEnable reasoning/thinking mode
toolsarrayNoAvailable functions/tools
tool_choicestring/objectNoTool selection: auto, none, required
temperaturefloatNoSampling temperature (0.0 to 2.0)
max_tokensintegerNoMaximum output tokens
streambooleanNoEnable streaming response
metadataobjectNoCustom metadata

Prompt Input Format

{
  "prompt": {
    "id": "prompt_name",
    "version": "1",
    "variables": {
      "variable_1": "Value 1",
      "variable_2": "Value 2"
    }
  },
  "input": "Unstructured input text related to the prompt."
}

Multimodal Input Format

{
  "input": [
    {
      "type": "text",
      "text": "What's in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
        "url": "data:image/jpeg;base64,..."
      }
    }
  ]
}

Response Format

The response contains an array-based output field with multiple item types:
{
  "id": "resp_abc123",
  "object": "response",
  "created": 1699123456,
  "model": "gpt-4o",
  "status": "completed",
  "output": [
    {
      "id": "msg_xyz",
      "type": "message",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum computing uses quantum mechanical phenomena...",
          "annotations": []
        }
      ]
    }
  ],
  "instructions": [
    {
      "type": "message",
      "role": "system",
      "status": "completed",
      "content": [
        {
          "type": "input_text",
          "text": "You are a helpful physics tutor"
        }
      ]
    },
    {
      "type": "message",
      "role": "user",
      "status": "completed",
      "content": [
        {
          "type": "input_text",
          "text": "Explain quantum computing"
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 25,
    "output_tokens": 150,
    "total_tokens": 175,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "parallel_tool_calls": true,
  "tool_choice": "auto",
  "tools": [],
  "temperature": 0.7,
  "top_p": 0.9,
  "max_output_tokens": 1500,
  "background": false,
  "reasoning": {},
  "text": {
    "format": {
      "type": "text"
    }
  }
}

Output Item Types

The output array can contain multiple types of items:

Text Messages

{
  "id": "msg_abc",
  "type": "message",
  "status": "completed",
  "role": "assistant",
  "content": [
    {
      "type": "output_text",
      "text": "Response content...",
      "annotations": [],
      "logprobs": []
    }
  ]
}

MCP Tool Lists

{
  "id": "mcpl_def",
  "type": "mcp_list_tools",
  "server_label": "filesystem",
  "tools": [
    {
      "name": "read_file",
      "description": "Read file contents",
      "input_schema": {
        "type": "object",
        "properties": {
          "path": {"type": "string"}
        }
      }
    }
  ],
  "error": null
}

MCP Tool Calls

{
  "id": "call_123",
  "type": "mcp_call",
  "status": "completed",
  "name": "read_file",
  "server_label": "filesystem",
  "arguments": "{\"path\":\"/data/file.txt\"}",
  "output": "File contents here...",
  "error": null
}

Function Tool Calls

{
  "type": "function_call",
  "call_id": "call_456",
  "name": "get_weather",
  "arguments": "{\"location\":\"Paris\"}",
  "id": "fc_789"
}

Reasoning Items

{
  "id": "rs_abc",
  "type": "reasoning",
  "status": "completed",
  "summary": [
    {
      "type": "summary_text",
      "text": "Let me think through this step by step..."
    }
  ]
}

Streaming Response Format

When streaming is enabled, responses are returned as Server-Sent Events (SSE) with the following format:
event: {event_type}
data: {json_payload}

Event Lifecycle

1. Initial Events
event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_abc","status":"in_progress","created_at":1699123456}}

event: response.in_progress
data: {"type":"response.in_progress","sequence_number":1,"response":{"id":"resp_abc","status":"in_progress"}}
2. MCP Tool List Events (if MCP tools are configured)
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":2,"output_index":0,"item":{"id":"mcpl_xyz","type":"mcp_list_tools","server_label":"filesystem","tools":[]}}

event: response.mcp_list_tools.in_progress
data: {"type":"response.mcp_list_tools.in_progress","sequence_number":3,"output_index":0,"item_id":"mcpl_xyz"}

event: response.mcp_list_tools.completed
data: {"type":"response.mcp_list_tools.completed","sequence_number":4,"output_index":0,"item_id":"mcpl_xyz"}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":5,"output_index":0,"item":{"id":"mcpl_xyz","type":"mcp_list_tools","server_label":"filesystem","tools":[{"name":"read_file","description":"Read file"}]}}
3. Text Output Events
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":6,"output_index":1,"item":{"id":"msg_abc","type":"message","status":"in_progress","role":"assistant","content":[]}}

event: response.content_part.added
data: {"type":"response.content_part.added","sequence_number":7,"item_id":"msg_abc","output_index":1,"content_index":0,"part":{"type":"output_text","text":"","annotations":[]}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":8,"item_id":"msg_abc","output_index":1,"content_index":0,"delta":"Quantum","logprobs":[]}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":9,"item_id":"msg_abc","output_index":1,"content_index":0,"delta":" computing","logprobs":[]}

event: response.output_text.done
data: {"type":"response.output_text.done","sequence_number":10,"item_id":"msg_abc","output_index":1,"content_index":0,"text":"Quantum computing uses...","logprobs":[]}

event: response.content_part.done
data: {"type":"response.content_part.done","sequence_number":11,"item_id":"msg_abc","output_index":1,"content_index":0,"part":{"type":"output_text","text":"Quantum computing uses...","annotations":[]}}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":12,"output_index":1,"item":{"id":"msg_abc","type":"message","status":"completed","role":"assistant","content":[{"type":"output_text","text":"Quantum computing uses..."}]}}
4. Reasoning Events (for thinking/reasoning models)
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":13,"output_index":0,"item":{"id":"rs_xyz","type":"reasoning","status":"in_progress","summary":[]}}

event: response.reasoning_summary_part.added
data: {"type":"response.reasoning_summary_part.added","sequence_number":14,"item_id":"rs_xyz","output_index":0,"summary_index":0,"part":{"type":"summary_text","text":""}}

event: response.reasoning_summary_text.delta
data: {"type":"response.reasoning_summary_text.delta","sequence_number":15,"item_id":"rs_xyz","output_index":0,"summary_index":0,"delta":"Let me think..."}

event: response.reasoning_summary_text.done
data: {"type":"response.reasoning_summary_text.done","sequence_number":16,"item_id":"rs_xyz","output_index":0,"summary_index":0,"text":"Let me think through this..."}

event: response.reasoning_summary_part.done
data: {"type":"response.reasoning_summary_part.done","sequence_number":17,"item_id":"rs_xyz","output_index":0,"summary_index":0,"part":{"type":"summary_text","text":"Let me think through this..."}}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":18,"output_index":0,"item":{"id":"rs_xyz","type":"reasoning","status":"completed","summary":[{"type":"summary_text","text":"Let me think through this..."}]}}
5. MCP Tool Call Events
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":19,"output_index":2,"item":{"id":"call_123","type":"mcp_call","status":"in_progress","name":"read_file","server_label":"filesystem","arguments":""}}

event: response.mcp_call.in_progress
data: {"type":"response.mcp_call.in_progress","sequence_number":20,"output_index":2,"item_id":"call_123"}

event: response.mcp_call_arguments.delta
data: {"type":"response.mcp_call_arguments.delta","sequence_number":21,"output_index":2,"item_id":"call_123","delta":"{\"path\"}"}

event: response.mcp_call_arguments.done
data: {"type":"response.mcp_call_arguments.done","sequence_number":22,"output_index":2,"item_id":"call_123","arguments":"{\"path\":\"/file.txt\"}"}

event: response.mcp_call.completed
data: {"type":"response.mcp_call.completed","sequence_number":23,"output_index":2,"item_id":"call_123"}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":24,"output_index":2,"item":{"id":"call_123","type":"mcp_call","status":"completed","name":"read_file","server_label":"filesystem","arguments":"{\"path\":\"/file.txt\"}","output":"file contents..."}}
6. Function Tool Call Events
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":25,"output_index":3,"item":{"type":"function_call","call_id":"call_456","name":"get_weather","arguments":"","id":"fc_789"}}

event: response.function_call_arguments.done
data: {"type":"response.function_call_arguments.done","sequence_number":26,"output_index":3,"item_id":"call_456","name":"get_weather","arguments":"{\"location\":\"Paris\"}"}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":27,"output_index":3,"item":{"type":"function_call","call_id":"call_456","name":"get_weather","arguments":"{\"location\":\"Paris\"}","id":"fc_789"}}
7. Completion Event
event: response.completed
data: {"type":"response.completed","sequence_number":28,"response":{"id":"resp_abc","object":"response","created_at":1699123456,"model":"gpt-4","status":"completed","output":[...],"instructions":[...],"usage":{"input_tokens":25,"output_tokens":150,"total_tokens":175}}}
8. Error Event (on failure)
event: response.failed
data: {"type":"response.failed","sequence_number":5,"response":{"id":"resp_abc","status":"failed","error":{"message":"Error description","type":"server_error","code":"execution_failed"}}}

Key Event Fields

  • sequence_number: Monotonically increasing counter for event ordering
  • output_index: Position in the output array (0-indexed)
  • item_id: Unique identifier for the specific item being streamed
  • content_index: Position within the content array (for messages)
  • summary_index: Position within the summary array (for reasoning)

Prompt-Based Execution

Execute pre-configured prompt templates using the prompt parameter: Request Example:
{
  "prompt": {
    "id": "prompt_template_id",
    "variables": {"topic": "quantum computing"},
    "version": "1"
  },
  "input": "Unstructured user input"
}
Fields:
  • prompt.id (required) - Template identifier
  • prompt.variables (optional) - Variable substitutions
  • prompt.version (optional) - Template version (defaults to default version)
  • input (optional) - Unstructured user input
Prompt Configuration (via UI or API): Users can pre-configure prompts with:
  • Model deployment and settings (temperature, max_tokens, top_p, etc.)
  • System prompt with Jinja2 template support
  • Conversation messages and context with Jinja2 template support
  • MCP tools (filesystem, web access, custom tools)
  • Input/output schemas for structured data
  • Validation rules and retry limits
  • Streaming configuration

Retrieve Response

Get details of a specific response. Endpoint: GET /v1/responses/{response_id}

Response Format

Returns the same format as the create response endpoint.

Delete Response

Remove a response from the system. Endpoint: DELETE /v1/responses/{response_id}

Response Format

{
  "id": "resp_abc123",
  "object": "response",
  "deleted": true
}

Cancel Response

Cancel an in-progress response generation. Endpoint: POST /v1/responses/{response_id}/cancel

Response Format

{
  "id": "resp_abc123",
  "object": "response",
  "status": "cancelled",
  "cancelled_at": 1699123456
}

List Input Items

Retrieve the input conversation history for a response. Endpoint: GET /v1/responses/{response_id}/input_items

Response Format

{
  "object": "list",
  "data": [
    {
      "type": "message",
      "role": "system",
      "content": "You are a helpful physics tutor"
    },
    {
      "type": "message",
      "role": "user",
      "content": "Explain quantum computing"
    },
    {
      "type": "message",
      "role": "assistant",
      "content": "I'd be happy to explain quantum computing..."
    }
  ]
}

Usage Examples

Basic Response

curl -X POST http://localhost:3000/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is machine learning?"
  }'

Prompt Execution

curl -X POST http://localhost:3000/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": {
      "id": "prompt_neural_networks",
      "variables": {
        "question": "Explain neural networks"
      }
    }
  }'

Multi-turn Conversation

# First response
RESPONSE_ID=$(curl -X POST http://localhost:3000/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Explain neural networks"
  }' | jq -r '.id')

# Follow-up response
curl -X POST http://localhost:3000/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "How do they differ from traditional algorithms?",
    "previous_response_id": "'$RESPONSE_ID'"
  }'

With Tool Calling

curl -X POST http://localhost:3000/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Calculate the fibonacci sequence up to 10",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "calculate_fibonacci",
          "description": "Calculate fibonacci numbers",
          "parameters": {
            "type": "object",
            "properties": {
              "n": {"type": "integer", "description": "Number of terms"}
            },
            "required": ["n"]
          }
        }
      }
    ]
  }'

Python Example

import requests
import json

class ResponsesAPI:
    def __init__(self, api_key, base_url="http://localhost:3000"):
        self.api_key = api_key
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def create_response(self, model, input_text, **kwargs):
        data = {
            "model": model,
            "input": input_text,
            **kwargs
        }

        response = requests.post(
            f"{self.base_url}/v1/responses",
            headers=self.headers,
            json=data
        )
        return response.json()

    def create_conversation(self, model, messages):
        """Create a multi-turn conversation"""
        response_id = None
        responses = []

        for message in messages:
            data = {
                "model": model,
                "input": message
            }

            if response_id:
                data["previous_response_id"] = response_id

            response = self.create_response(model, message,
                                          previous_response_id=response_id)
            responses.append(response)
            response_id = response["id"]

        return responses

    def stream_response(self, model, input_text, **kwargs):
        """Stream response with SSE"""
        data = {
            "model": model,
            "input": input_text,
            "stream": True,
            **kwargs
        }

        response = requests.post(
            f"{self.base_url}/v1/responses",
            headers=self.headers,
            json=data,
            stream=True
        )

        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = line[6:]
                    if data == '[DONE]':
                        break
                    yield json.loads(data)

# Usage
api = ResponsesAPI("YOUR_API_KEY")

# Simple response
response = api.create_response(
    "gpt-4o",
    "Explain the theory of relativity",
    temperature=0.7
)
print(response["output"]["content"])

# Multi-turn conversation
conversation = api.create_conversation(
    "gpt-4o",
    [
        "What is artificial intelligence?",
        "How does it relate to machine learning?",
        "What are some practical applications?"
    ]
)

# Streaming response
for chunk in api.stream_response("gpt-4o", "Write a short story"):
    if "delta" in chunk and "content" in chunk["delta"]:
        print(chunk["delta"]["content"], end="", flush=True)

# Multimodal input
with open("image.jpg", "rb") as f:
    import base64
    image_data = base64.b64encode(f.read()).decode()

    response = api.create_response(
        "gpt-4o-vision",
        [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}
        ]
    )

JavaScript Example

class ResponsesAPI {
  constructor(apiKey, baseUrl = 'http://localhost:3000') {
    this.apiKey = apiKey;
    this.baseUrl = baseUrl;
  }

  async createResponse(model, input, options = {}) {
    const response = await fetch(`${this.baseUrl}/v1/responses`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model,
        input,
        ...options
      })
    });

    return await response.json();
  }

  async *streamResponse(model, input, options = {}) {
    const response = await fetch(`${this.baseUrl}/v1/responses`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model,
        input,
        stream: true,
        ...options
      })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop();

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') return;
          yield JSON.parse(data);
        }
      }
    }
  }

  async createConversation(model, messages) {
    let responseId = null;
    const responses = [];

    for (const message of messages) {
      const response = await this.createResponse(
        model,
        message,
        responseId ? { previous_response_id: responseId } : {}
      );

      responses.push(response);
      responseId = response.id;
    }

    return responses;
  }
}

// Usage
const api = new ResponsesAPI('YOUR_API_KEY');

// Simple response
const response = await api.createResponse(
  'gpt-4o',
  'What is the meaning of life?',
  { temperature: 0.9 }
);
console.log(response.output.content);

// Streaming response
for await (const chunk of api.streamResponse('gpt-4o', 'Tell me a joke')) {
  if (chunk.delta?.content) {
    process.stdout.write(chunk.delta.content);
  }
}

// Tool calling
const toolResponse = await api.createResponse(
  'gpt-4o',
  'What is the weather in Paris?',
  {
    tools: [{
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get weather information',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' }
          },
          required: ['location']
        }
      }
    }]
  }
);

// Handle tool calls
if (toolResponse.output.tool_calls?.length > 0) {
  for (const toolCall of toolResponse.output.tool_calls) {
    console.log(`Calling ${toolCall.function.name} with:`,
                JSON.parse(toolCall.function.arguments));
  }
}

Advanced Features

Reasoning Models

Enable step-by-step reasoning:
{
  "model": "o1-preview",
  "input": "Solve this complex problem...",
  "reasoning": true
}

Parallel Tool Calling

The API supports calling multiple tools in parallel:
{
  "output": {
    "tool_calls": [
      {
        "id": "call_1",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"Paris\"}"
        }
      },
      {
        "id": "call_2",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"London\"}"
        }
      }
    ]
  }
}

Conversation Context

Maintain context across multiple interactions:
{
  "model": "gpt-4o",
  "input": "Continue our discussion",
  "previous_response_id": "resp_previous",
  "instructions": "You are a helpful tutor who remembers previous conversations"
}

Error Responses

400 Bad Request

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "code": "invalid_model"
  }
}

401 Unauthorized

{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

429 Rate Limit

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retry_after": 60
  }
}

Best Practices

  • Conversation Management: Use previous_response_id for coherent multi-turn conversations
  • Tool Design: Create focused, single-purpose tools for better reliability
  • Streaming: Use streaming for long responses to improve user experience
  • Error Handling: Implement robust retry logic for transient failures
  • Metadata: Use metadata to track conversations and user sessions
  • Context Window: Be mindful of token limits when building long conversations
  • Parallel Tools: Leverage parallel tool calling for independent operations
  • Prompt Templates: Design reusable prompt templates with clear variable names for maintainability
  • Variable Management: Use descriptive variable names and provide defaults where appropriate
  • Version Control: Use prompt versioning to iterate on prompts without breaking existing integrations

Limitations

  • Some advanced retrieval features may not be fully implemented
  • Response management endpoints have limited functionality
  • Conversation history is maintained only through previous_response_id chaining
  • Maximum context window depends on the model used
  • Prompt templates must be pre-configured before use
  • Maximum context window depends on the model used, including when configured in a prompt template.

Supported providers

OpenAI

Next-generation Responses API with full support for advanced conversational features, multi-turn interactions, and parallel tool calling.