Gateway APIs
Process Document
Run OCR / document extraction and export to multiple structured formats.
POST
The
/v1/documents endpoint extracts content from a PDF or image using a vision model and
returns it in the formats you request (md, json, html, text, doctags). Request several
formats in a single call — each is returned under its own *_content field.
Only the formats you request in
options.to_formats appear in the document object; the others
are omitted. The example above requested ["doctags", "md"], so only doctags_content and
md_content are returned.Headers
| Parameter | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Yes | Bearer authentication header |
| Content-Type | string | Yes | application/json |
Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | A document‑capable model (must be backed by the document provider — see Models). |
| document | object | Yes | The document to process. See Document input. |
| task | string | No | Processing task. Currently extract. Default: extract. |
| prompt | string | No | Optional override for the prompt sent to the vision model. |
| options | object | No | Export and processing options. See Options. |
Document input
Thedocument object selects the source by type:
| Field | Type | Description |
|---|---|---|
| type | string | document_url (PDF/document) or image_url (image). |
| document_url | string | URL or base64 data URI. Required when type is document_url. |
| image_url | string | URL or base64 data URI. Required when type is image_url. |
Options
| Field | Type | Default | Description |
|---|---|---|---|
| to_formats | string[] | ["md"] | Output formats to export. Allowed: md, json, html, text, doctags. markdown is accepted as an alias for md. Values are case‑insensitive. |
| vlm_response_format | string | doctags | Advanced: the intermediate format the vision model is asked for (doctags for structured assembly, or markdown for raw passthrough). |
Output formats
Each requestedto_formats value maps to a field in the response document object:
to_formats value | Response field | Contents |
|---|---|---|
md (or markdown) | md_content | Markdown export of the assembled document. |
json | json_content | JSON‑encoded string of the structured document (texts, tables, pictures, layout). |
html | html_content | HTML export. |
text | text_content | Plain‑text export. |
doctags | doctags_content | DocTags structured markup. |
Structured output (real DocTags, structured HTML/JSON) requires a DocTags‑capable vision model
(e.g. granite‑docling / SmolDocling). With a general vision model that returns prose, the service
falls back to populating each requested format from the raw extracted text — so the call still
succeeds, but
json_content/html_content/doctags_content wrap the raw text rather than carrying
fully structured output.Response
| Field | Type | Description |
|---|---|---|
| id | string | Response id, prefixed doc_. |
| object | string | Always document. |
| created | integer | Unix timestamp (seconds). |
| model | string | The model used. |
| document_id | string | Unique id for the processed document. |
| pages | object[] | Per‑page results: { page_number, markdown } (raw per‑page model output). |
| usage_info | object | { pages_processed, size_bytes, filename }. |
| document | object | Per‑format exports: md_content, json_content, html_content, text_content, doctags_content. Only requested formats are present. |
Errors
The endpoint returns the underlying failure’s real status code:| Status | Meaning |
|---|---|
| 400 | Bad request — e.g. the document could not be fetched, was invalid, or the model produced no content. The body carries the underlying message. |
| 404 | The referenced document URL could not be found. |
| 401 | Missing or invalid API key. |
400
Models
/v1/documents is served by the document provider (powered by docling
and a vision model), not by chat providers. The model you pass must be registered as
document‑capable and routed to the document backend. A vision/multimodal model (image input +
text output) is required; a DocTags model (granite‑docling / SmolDocling) is recommended for
genuinely structured doctags/html/json output.