Overview
The Responses API provides a next-generation interface for complex AI interactions, supporting:- Prompt-based execution: Execute versioned prompt templates with variable substitution
- MCP tool integration: Access Model Context Protocol tools for extended functionality
- Structured outputs: JSON schema-validated responses for reliable data extraction
- Array-based outputs: Multiple output types (messages, tool calls, reasoning, MCP tool lists)
- Multi-turn conversations with context preservation
- Parallel tool/function calling
- Multimodal inputs (text, image, audio)
- Reasoning model capabilities
- Streaming responses
Endpoints
Authentication
Create Response
Generate AI responses with advanced conversational features.Request Format
Endpoint:POST /v1/responses
Headers:
Authorization: Bearer YOUR_API_KEY(required)Content-Type: application/json(required)
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
model | string | No | Model identifier |
prompt | object | No | Prompt template parameters |
input | string/array | No | Text or multimodal content |
previous_response_id | string | No | ID for conversation continuity |
instructions | string | No | System instructions |
modalities | array | No | Output types: ["text"], ["text", "audio"] |
reasoning | boolean | No | Enable reasoning/thinking mode |
tools | array | No | Available functions/tools |
tool_choice | string/object | No | Tool selection: auto, none, required |
temperature | float | No | Sampling temperature (0.0 to 2.0) |
max_tokens | integer | No | Maximum output tokens |
stream | boolean | No | Enable streaming response |
metadata | object | No | Custom metadata |
Prompt Input Format
Multimodal Input Format
Response Format
The response contains an array-basedoutput field with multiple item types:
Output Item Types
Theoutput array can contain multiple types of items:
Text Messages
MCP Tool Lists
MCP Tool Calls
Function Tool Calls
Reasoning Items
Streaming Response Format
When streaming is enabled, responses are returned as Server-Sent Events (SSE) with the following format:Event Lifecycle
1. Initial EventsKey Event Fields
sequence_number: Monotonically increasing counter for event orderingoutput_index: Position in the output array (0-indexed)item_id: Unique identifier for the specific item being streamedcontent_index: Position within the content array (for messages)summary_index: Position within the summary array (for reasoning)
Prompt-Based Execution
Execute pre-configured prompt templates using theprompt parameter:
Request Example:
prompt.id(required) - Template identifierprompt.variables(optional) - Variable substitutionsprompt.version(optional) - Template version (defaults to default version)input(optional) - Unstructured user input
- Model deployment and settings (temperature, max_tokens, top_p, etc.)
- System prompt with Jinja2 template support
- Conversation messages and context with Jinja2 template support
- MCP tools (filesystem, web access, custom tools)
- Input/output schemas for structured data
- Validation rules and retry limits
- Streaming configuration
Retrieve Response
Get details of a specific response. Endpoint:GET /v1/responses/{response_id}
Response Format
Returns the same format as the create response endpoint.Delete Response
Remove a response from the system. Endpoint:DELETE /v1/responses/{response_id}
Response Format
Cancel Response
Cancel an in-progress response generation. Endpoint:POST /v1/responses/{response_id}/cancel
Response Format
List Input Items
Retrieve the input conversation history for a response. Endpoint:GET /v1/responses/{response_id}/input_items
Response Format
Usage Examples
Basic Response
Prompt Execution
Multi-turn Conversation
With Tool Calling
Python Example
JavaScript Example
Advanced Features
Reasoning Models
Enable step-by-step reasoning:Parallel Tool Calling
The API supports calling multiple tools in parallel:Conversation Context
Maintain context across multiple interactions:Error Responses
400 Bad Request
401 Unauthorized
429 Rate Limit
Best Practices
- Conversation Management: Use
previous_response_idfor coherent multi-turn conversations - Tool Design: Create focused, single-purpose tools for better reliability
- Streaming: Use streaming for long responses to improve user experience
- Error Handling: Implement robust retry logic for transient failures
- Metadata: Use metadata to track conversations and user sessions
- Context Window: Be mindful of token limits when building long conversations
- Parallel Tools: Leverage parallel tool calling for independent operations
- Prompt Templates: Design reusable prompt templates with clear variable names for maintainability
- Variable Management: Use descriptive variable names and provide defaults where appropriate
- Version Control: Use prompt versioning to iterate on prompts without breaking existing integrations
Limitations
- Some advanced retrieval features may not be fully implemented
- Response management endpoints have limited functionality
- Conversation history is maintained only through
previous_response_idchaining - Maximum context window depends on the model used
- Prompt templates must be pre-configured before use
- Maximum context window depends on the model used, including when configured in a prompt template.
Supported providers
OpenAI
Next-generation Responses API with full support for advanced conversational features, multi-turn interactions, and parallel tool calling.