Skip to main content

Endpoint

POST /v1/embeddings

Authentication

This endpoint requires API key authentication. Required Header:
Authorization: Bearer <YOUR_API_KEY>

Request Format

Headers

HeaderRequiredDescription
AuthorizationYesBearer token for API authentication
Content-TypeYesMust be application/json

Request Body

{
  "model": "string",
  "input": "string" | ["array", "of", "strings"],
  "encoding_format": "float",
  "tensorzero::cache_options": {
    "enabled": "on" | "off",
    "max_age_s": 3600
  }
}

Parameters

ParameterTypeRequiredDescription
modelstringYesThe model identifier to use for embeddings. Can be a simple model name (e.g., text-embedding-3-small) or prefixed with tensorzero:: (e.g., tensorzero::my-embedding-model::)
inputstring | string[]YesThe text(s) to generate embeddings for. Can be a single string or an array of strings for batch processing
encoding_formatstringNoThe format of the embeddings. Currently only "float" is supported (default)
tensorzero::cache_optionsobjectNoCaching configuration for the request
tensorzero::cache_options.enabledstringNoEnable ("on") or disable ("off") caching for this request
tensorzero::cache_options.max_age_sintegerNoMaximum age in seconds for cached embeddings

Response Format

Success Response (200 OK)

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0023064255, -0.009327292, ...],
      "index": 0
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Response Fields

FieldTypeDescription
objectstringAlways "list"
dataarrayArray of embedding objects
data[].objectstringAlways "embedding"
data[].embeddingfloat[]The embedding vector as an array of floats
data[].indexintegerThe index of this embedding in the batch (0-based)
modelstringThe model used to generate the embeddings
usageobjectToken usage information
usage.prompt_tokensintegerNumber of tokens in the input
usage.total_tokensintegerTotal tokens used (same as prompt_tokens for embeddings)

Error Responses

400 Bad Request

Invalid request format or parameters.
{
  "error": {
    "message": "Invalid request: missing required field 'model'",
    "type": "invalid_request_error",
    "code": "invalid_request"
  }
}

401 Unauthorized

Missing or invalid API key.
{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

404 Not Found

Model not found or doesn’t support embeddings.
{
  "error": {
    "message": "Model 'unknown-model' not found",
    "type": "not_found_error",
    "code": "model_not_found"
  }
}

503 Service Unavailable

All model providers exhausted (no available providers could handle the request).
{
  "error": {
    "message": "All model providers exhausted",
    "type": "service_unavailable",
    "code": "providers_exhausted"
  }
}

Usage Examples

Single Text Embedding

curl -X POST https://api.tensorzero.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Batch Embeddings

curl -X POST https://api.tensorzero.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": [
      "First text to embed",
      "Second text to embed",
      "Third text to embed"
    ]
  }'

With Caching Options

curl -X POST https://api.tensorzero.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Text to embed with caching",
    "tensorzero::cache_options": {
      "enabled": "on",
      "max_age_s": 3600
    }
  }'

Python Example

import requests
import json

url = "https://api.tensorzero.com/v1/embeddings"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Single embedding
data = {
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# Extract the embedding vector
embedding = result["data"][0]["embedding"]
print(f"Embedding dimension: {len(embedding)}")

# Batch embeddings
batch_data = {
    "model": "text-embedding-3-small",
    "input": [
        "First text",
        "Second text",
        "Third text"
    ]
}

batch_response = requests.post(url, headers=headers, json=batch_data)
batch_result = batch_response.json()

for item in batch_result["data"]:
    print(f"Index {item['index']}: {len(item['embedding'])} dimensions")

JavaScript/TypeScript Example

const url = 'https://api.tensorzero.com/v1/embeddings';
const headers = {
  'Authorization': 'Bearer YOUR_API_KEY',
  'Content-Type': 'application/json'
};

// Single embedding
const data = {
  model: 'text-embedding-3-small',
  input: 'The quick brown fox jumps over the lazy dog'
};

fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(data)
})
  .then(response => response.json())
  .then(result => {
    const embedding = result.data[0].embedding;
    console.log(`Embedding dimension: ${embedding.length}`);
  });

// Batch embeddings with async/await
async function getBatchEmbeddings() {
  const batchData = {
    model: 'text-embedding-3-small',
    input: [
      'First text',
      'Second text',
      'Third text'
    ]
  };

  const response = await fetch(url, {
    method: 'POST',
    headers: headers,
    body: JSON.stringify(batchData)
  });

  const result = await response.json();

  result.data.forEach(item => {
    console.log(`Index ${item.index}: ${item.embedding.length} dimensions`);
  });
}

Notes

  • The endpoint supports batch processing for efficiency when embedding multiple texts
  • Embeddings are returned as arrays of floating-point numbers
  • The model must be configured in TensorZero with embedding capabilities
  • Caching can significantly improve performance for repeated queries
  • Token usage is calculated based on the input text(s)
  • The endpoint is compatible with OpenAI’s embedding API format, making it easy to switch between providers

Supported providers

OpenAI

Offers advanced embedding models including text-embedding-3-small and text-embedding-3-large for semantic search and similarity tasks.

Azure

Microsoft Azure OpenAI Service provides access to OpenAI’s embedding models with enterprise-grade security and compliance.

Together.AI

Provides various open-source embedding models optimized for performance and cost-effectiveness.

Fireworks AI

High-performance embedding models with fast inference times for real-time applications.

Mistral AI

Offers Mistral-embed model for high-quality text embeddings with multilingual support.