Process Document - Bud Stack Documentation

POST

documents

curl https://gateway.bud.studio/v1/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "document-processor",
    "document": {
      "type": "document_url",
      "document_url": "https://arxiv.org/pdf/2408.09869"
    },
    "task": "extract",
    "options": {
      "to_formats": ["doctags", "md"]
    }
  }'

{
  "id": "doc_019f12e9-b26c-7f23-b352-fe572ae08f49",
  "object": "document",
  "created": 1699000000,
  "model": "document-processor",
  "document_id": "fb1b390e-9459-4c87-9e45-eb757fd1fa99",
  "pages": [
    { "page_number": 1, "markdown": "# Document title\n\nExtracted content..." }
  ],
  "usage_info": {
    "pages_processed": 9,
    "size_bytes": 5566575,
    "filename": "2408.09869"
  },
  "document": {
    "doctags_content": "<doctag>...</doctag>",
    "md_content": "# Document title\n\nExtracted content..."
  }
}

The /v1/documents endpoint extracts content from a PDF or image using a vision model and returns it in the formats you request (md, json, html, text, doctags). Request several formats in a single call — each is returned under its own *_content field.

curl https://gateway.bud.studio/v1/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "document-processor",
    "document": {
      "type": "document_url",
      "document_url": "https://arxiv.org/pdf/2408.09869"
    },
    "task": "extract",
    "options": {
      "to_formats": ["doctags", "md"]
    }
  }'

{
  "id": "doc_019f12e9-b26c-7f23-b352-fe572ae08f49",
  "object": "document",
  "created": 1699000000,
  "model": "document-processor",
  "document_id": "fb1b390e-9459-4c87-9e45-eb757fd1fa99",
  "pages": [
    { "page_number": 1, "markdown": "# Document title\n\nExtracted content..." }
  ],
  "usage_info": {
    "pages_processed": 9,
    "size_bytes": 5566575,
    "filename": "2408.09869"
  },
  "document": {
    "doctags_content": "<doctag>...</doctag>",
    "md_content": "# Document title\n\nExtracted content..."
  }
}

Only the formats you request in options.to_formats appear in the document object; the others are omitted. The example above requested ["doctags", "md"], so only doctags_content and md_content are returned.

Headers

Parameter	Type	Required	Description
Authorization	string	Yes	Bearer authentication header
Content-Type	string	Yes	`application/json`

Body

Parameter	Type	Required	Description
model	string	Yes	A document‑capable model (must be backed by the document provider — see Models).
document	object	Yes	The document to process. See Document input.
task	string	No	Processing task. Currently `extract`. Default: `extract`.
prompt	string	No	Optional override for the prompt sent to the vision model.
options	object	No	Export and processing options. See Options.

Document input

The document object selects the source by type:

Field	Type	Description
type	string	`document_url` (PDF/document) or `image_url` (image).
document_url	string	URL or base64 data URI. Required when `type` is `document_url`.
image_url	string	URL or base64 data URI. Required when `type` is `image_url`.

Options

Field	Type	Default	Description
to_formats	string[]	`["md"]`	Output formats to export. Allowed: `md`, `json`, `html`, `text`, `doctags`. `markdown` is accepted as an alias for `md`. Values are case‑insensitive.
vlm_response_format	string	`doctags`	Advanced: the intermediate format the vision model is asked for (`doctags` for structured assembly, or `markdown` for raw passthrough).

Output formats

Each requested to_formats value maps to a field in the response document object:

`to_formats` value	Response field	Contents
`md` (or `markdown`)	`md_content`	Markdown export of the assembled document.
`json`	`json_content`	JSON‑encoded string of the structured document (texts, tables, pictures, layout).
`html`	`html_content`	HTML export.
`text`	`text_content`	Plain‑text export.
`doctags`	`doctags_content`	DocTags structured markup.

Structured output (real DocTags, structured HTML/JSON) requires a DocTags‑capable vision model (e.g. granite‑docling / SmolDocling). With a general vision model that returns prose, the service falls back to populating each requested format from the raw extracted text — so the call still succeeds, but json_content/html_content/doctags_content wrap the raw text rather than carrying fully structured output.

Response

Field	Type	Description
id	string	Response id, prefixed `doc_`.
object	string	Always `document`.
created	integer	Unix timestamp (seconds).
model	string	The model used.
document_id	string	Unique id for the processed document.
pages	object[]	Per‑page results: `{ page_number, markdown }` (raw per‑page model output).
usage_info	object	`{ pages_processed, size_bytes, filename }`.
document	object	Per‑format exports: `md_content`, `json_content`, `html_content`, `text_content`, `doctags_content`. Only requested formats are present.

Errors

The endpoint returns the underlying failure’s real status code:

Status	Meaning
400	Bad request — e.g. the document could not be fetched, was invalid, or the model produced no content. The body carries the underlying message.
404	The referenced document URL could not be found.
401	Missing or invalid API key.

400

{
  "error": {
    "message": "BudDoc returned error status 400 Bad Request: {\"detail\":\"Failed to fetch document from URL: ...\"}"
  }
}

Models

/v1/documents is served by the document provider (powered by docling and a vision model), not by chat providers. The model you pass must be registered as document‑capable and routed to the document backend. A vision/multimodal model (image input + text output) is required; a DocTags model (granite‑docling / SmolDocling) is recommended for genuinely structured doctags/html/json output.

List ModelsRetrieve a list of available models.

curl https://gateway.bud.studio/v1/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "document-processor",
    "document": {
      "type": "document_url",
      "document_url": "https://arxiv.org/pdf/2408.09869"
    },
    "task": "extract",
    "options": {
      "to_formats": ["doctags", "md"]
    }
  }'

{
  "id": "doc_019f12e9-b26c-7f23-b352-fe572ae08f49",
  "object": "document",
  "created": 1699000000,
  "model": "document-processor",
  "document_id": "fb1b390e-9459-4c87-9e45-eb757fd1fa99",
  "pages": [
    { "page_number": 1, "markdown": "# Document title\n\nExtracted content..." }
  ],
  "usage_info": {
    "pages_processed": 9,
    "size_bytes": 5566575,
    "filename": "2408.09869"
  },
  "document": {
    "doctags_content": "<doctag>...</doctag>",
    "md_content": "# Document title\n\nExtracted content..."
  }
}

​Headers

​Body

​Document input

​Options

​Output formats

​Response

​Errors

​Models

Headers

Body

Document input

Options

Output formats

Response

Errors

Models