Skip to main content
POST
/
v1
/
documents
curl https://gateway.bud.studio/v1/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "document-processor",
    "document": {
      "type": "document_url",
      "document_url": "https://arxiv.org/pdf/2408.09869"
    },
    "task": "extract",
    "options": {
      "to_formats": ["doctags", "md"]
    }
  }'
{
  "id": "doc_019f12e9-b26c-7f23-b352-fe572ae08f49",
  "object": "document",
  "created": 1699000000,
  "model": "document-processor",
  "document_id": "fb1b390e-9459-4c87-9e45-eb757fd1fa99",
  "pages": [
    { "page_number": 1, "markdown": "# Document title\n\nExtracted content..." }
  ],
  "usage_info": {
    "pages_processed": 9,
    "size_bytes": 5566575,
    "filename": "2408.09869"
  },
  "document": {
    "doctags_content": "<doctag>...</doctag>",
    "md_content": "# Document title\n\nExtracted content..."
  }
}
The /v1/documents endpoint extracts content from a PDF or image using a vision model and returns it in the formats you request (md, json, html, text, doctags). Request several formats in a single call — each is returned under its own *_content field.
curl https://gateway.bud.studio/v1/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "document-processor",
    "document": {
      "type": "document_url",
      "document_url": "https://arxiv.org/pdf/2408.09869"
    },
    "task": "extract",
    "options": {
      "to_formats": ["doctags", "md"]
    }
  }'
{
  "id": "doc_019f12e9-b26c-7f23-b352-fe572ae08f49",
  "object": "document",
  "created": 1699000000,
  "model": "document-processor",
  "document_id": "fb1b390e-9459-4c87-9e45-eb757fd1fa99",
  "pages": [
    { "page_number": 1, "markdown": "# Document title\n\nExtracted content..." }
  ],
  "usage_info": {
    "pages_processed": 9,
    "size_bytes": 5566575,
    "filename": "2408.09869"
  },
  "document": {
    "doctags_content": "<doctag>...</doctag>",
    "md_content": "# Document title\n\nExtracted content..."
  }
}
Only the formats you request in options.to_formats appear in the document object; the others are omitted. The example above requested ["doctags", "md"], so only doctags_content and md_content are returned.

Headers

ParameterTypeRequiredDescription
AuthorizationstringYesBearer authentication header
Content-TypestringYesapplication/json

Body

ParameterTypeRequiredDescription
modelstringYesA document‑capable model (must be backed by the document provider — see Models).
documentobjectYesThe document to process. See Document input.
taskstringNoProcessing task. Currently extract. Default: extract.
promptstringNoOptional override for the prompt sent to the vision model.
optionsobjectNoExport and processing options. See Options.

Document input

The document object selects the source by type:
FieldTypeDescription
typestringdocument_url (PDF/document) or image_url (image).
document_urlstringURL or base64 data URI. Required when type is document_url.
image_urlstringURL or base64 data URI. Required when type is image_url.

Options

FieldTypeDefaultDescription
to_formatsstring[]["md"]Output formats to export. Allowed: md, json, html, text, doctags. markdown is accepted as an alias for md. Values are case‑insensitive.
vlm_response_formatstringdoctagsAdvanced: the intermediate format the vision model is asked for (doctags for structured assembly, or markdown for raw passthrough).

Output formats

Each requested to_formats value maps to a field in the response document object:
to_formats valueResponse fieldContents
md (or markdown)md_contentMarkdown export of the assembled document.
jsonjson_contentJSON‑encoded string of the structured document (texts, tables, pictures, layout).
htmlhtml_contentHTML export.
texttext_contentPlain‑text export.
doctagsdoctags_contentDocTags structured markup.
Structured output (real DocTags, structured HTML/JSON) requires a DocTags‑capable vision model (e.g. granite‑docling / SmolDocling). With a general vision model that returns prose, the service falls back to populating each requested format from the raw extracted text — so the call still succeeds, but json_content/html_content/doctags_content wrap the raw text rather than carrying fully structured output.

Response

FieldTypeDescription
idstringResponse id, prefixed doc_.
objectstringAlways document.
createdintegerUnix timestamp (seconds).
modelstringThe model used.
document_idstringUnique id for the processed document.
pagesobject[]Per‑page results: { page_number, markdown } (raw per‑page model output).
usage_infoobject{ pages_processed, size_bytes, filename }.
documentobjectPer‑format exports: md_content, json_content, html_content, text_content, doctags_content. Only requested formats are present.

Errors

The endpoint returns the underlying failure’s real status code:
StatusMeaning
400Bad request — e.g. the document could not be fetched, was invalid, or the model produced no content. The body carries the underlying message.
404The referenced document URL could not be found.
401Missing or invalid API key.
400
{
  "error": {
    "message": "BudDoc returned error status 400 Bad Request: {\"detail\":\"Failed to fetch document from URL: ...\"}"
  }
}

Models

/v1/documents is served by the document provider (powered by docling and a vision model), not by chat providers. The model you pass must be registered as document‑capable and routed to the document backend. A vision/multimodal model (image input + text output) is required; a DocTags model (granite‑docling / SmolDocling) is recommended for genuinely structured doctags/html/json output.