Skip to main content
POST
/
v1
/
chat
/
completions
curl https://gateway.bud.studio/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-1b",
    "messages": [
      {
        "role": "user",
        "content": "What is AI?"
      }
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "llama-3.2-1b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}
curl https://gateway.bud.studio/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-1b",
    "messages": [
      {
        "role": "user",
        "content": "What is AI?"
      }
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "llama-3.2-1b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Headers

ParameterTypeRequiredDescription
AuthorizationstringYesBearer authentication header of the form Bearer <token> where <token> is your API key

Body

ParameterTypeRequiredDescription
modelstringYesModel identifier (deployment name, adapter name, or routing)
messagesarrayYesArray of message objects forming the conversation
temperaturefloatNoSampling temperature (0.0 to 2.0). Default: model default
max_tokensintegerNoMaximum tokens to generate. Default: model default
max_completion_tokensintegerNoMaximum tokens to generate (alternative to max_tokens)
top_pfloatNoNucleus sampling parameter (0.0 to 1.0). Default: 1.0
frequency_penaltyfloatNoPenalize tokens based on frequency (-2.0 to 2.0). Default: 0.0
presence_penaltyfloatNoPenalize tokens based on presence (-2.0 to 2.0). Default: 0.0
repetition_penaltyfloatNoPenalize token repetition (> 0.0). Default: 1.0
streambooleanNoEnable streaming response. Default: false
stream_optionsobjectNoStreaming options (e.g., {"include_usage": true})
nintegerNoNumber of chat completion choices to generate. Default: 1
stopstring or arrayNoUp to 4 sequences where the API will stop generating
response_formatobjectNoResponse format specification (e.g., {"type": "json_object"})
seedintegerNoRandom seed for deterministic sampling
logprobsbooleanNoInclude log probabilities in response. Default: false
top_logprobsintegerNoNumber of most likely tokens to return (0-20)
logit_biasobjectNoModify likelihood of specified tokens appearing
toolsarrayNoAvailable tool/function definitions
tool_choicestring or objectNoTool selection strategy. Options: auto, none, required, or specific tool
parallel_tool_callsbooleanNoAllow parallel tool calls. Default: true
userstringNoUnique identifier representing your end-user

Message Object

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "string or array of content blocks",
  "name": "optional participant name",
  "tool_calls": "array of tool calls (for assistant messages)",
  "tool_call_id": "ID of the tool call (for tool messages)"
}

Supported Providers

OpenAI

GPT-4, GPT-3.5 Turbo, and o1 models

Anthropic

Claude 3.5 Sonnet, Claude 3 Opus, and Haiku

Azure OpenAI

Enterprise GPT-4 and GPT-3.5 deployments

Google

Gemini Pro and Ultra models

AWS Bedrock

Claude, Llama, and Mistral on AWS

Together AI

Llama 3, Mixtral, and open-source models

Fireworks

Fast inference for Llama and Mistral models

xAI

Grok models with extended context