Endpoint
Authentication
This endpoint requires API key authentication. Required Header:Request Format
Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer token for API authentication |
Content-Type | Yes | Must be application/json |
Request Body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model identifier to use for embeddings. Can be a simple model name (e.g., text-embedding-3-small) or prefixed with tensorzero:: (e.g., tensorzero::my-embedding-model::) |
input | string | string[] | Yes | The text(s) to generate embeddings for. Can be a single string or an array of strings for batch processing |
encoding_format | string | No | The format of the embeddings. Currently only "float" is supported (default) |
tensorzero::cache_options | object | No | Caching configuration for the request |
tensorzero::cache_options.enabled | string | No | Enable ("on") or disable ("off") caching for this request |
tensorzero::cache_options.max_age_s | integer | No | Maximum age in seconds for cached embeddings |
Response Format
Success Response (200 OK)
Response Fields
| Field | Type | Description |
|---|---|---|
object | string | Always "list" |
data | array | Array of embedding objects |
data[].object | string | Always "embedding" |
data[].embedding | float[] | The embedding vector as an array of floats |
data[].index | integer | The index of this embedding in the batch (0-based) |
model | string | The model used to generate the embeddings |
usage | object | Token usage information |
usage.prompt_tokens | integer | Number of tokens in the input |
usage.total_tokens | integer | Total tokens used (same as prompt_tokens for embeddings) |
Error Responses
400 Bad Request
Invalid request format or parameters.401 Unauthorized
Missing or invalid API key.404 Not Found
Model not found or doesn’t support embeddings.503 Service Unavailable
All model providers exhausted (no available providers could handle the request).Usage Examples
Single Text Embedding
Batch Embeddings
With Caching Options
Python Example
JavaScript/TypeScript Example
Notes
- The endpoint supports batch processing for efficiency when embedding multiple texts
- Embeddings are returned as arrays of floating-point numbers
- The model must be configured in TensorZero with embedding capabilities
- Caching can significantly improve performance for repeated queries
- Token usage is calculated based on the input text(s)
- The endpoint is compatible with OpenAI’s embedding API format, making it easy to switch between providers
Supported providers
OpenAI
Offers advanced embedding models including text-embedding-3-small and text-embedding-3-large for semantic search and similarity tasks.
Azure
Microsoft Azure OpenAI Service provides access to OpenAI’s embedding models with enterprise-grade security and compliance.
Together.AI
Provides various open-source embedding models optimized for performance and cost-effectiveness.
Fireworks AI
High-performance embedding models with fast inference times for real-time applications.
Mistral AI
Offers Mistral-embed model for high-quality text embeddings with multilingual support.