iliad_document_parsing
ActiveTool of AXIS Toolbox — Agentic Commerce Codebase Intelligence
AXIS-owned document → Markdown extractor. Accepts either `document_url` (https fetch + 50 MiB cap + 60s timeout) or `document_base64` (inline bytes, 50 MiB decoded cap) — exactly one. Optional `mime_type` hint (application/pdf, application/vnd.openxmlformats-officedocument.wordprocessingml.document, text/html, text/markdown, text/plain); we sniff from magic bytes + URL extension when omitted. Format dispatch: PDF → pdfjs-dist text extraction (one block per page with `--- page N ---` separators); DOCX → mammoth → markdown (tables preserved); HTML → tag-strip with heading + list + entity handling (NOT a full HTML→MD converter — bring turndown if you need fancier); plain text + markdown → passthrough. Returns `{markdown, format_detected, byte_size, page_count, table_count, truncated}`. Output capped at 1 MiB markdown with a truncation marker. Engineer mode (X-Agent-Mode: engineer — Document Intelligence, $0.10): adds an `engineer` block with retrieval chunks (heading-aware, overlapping) + extract-to-caller-schema (pass `json_schema` → a grammar-constrained, validated typed object) + image OCR (image/* via document_base64) — typed data, not just markdown. Requires Authorization: Bearer <api_key>.
Parameters schema
{
"type": "object",
"properties": {
"mime_type": {
"type": "string",
"description": "Optional MIME-type hint. When omitted we sniff from magic bytes + URL extension. Engineer mode: an image/* mime triggers OCR."
},
"json_schema": {
"type": "object",
"description": "Engineer mode: a JSON Schema. The document is extracted into a validated object matching it (returned in engineer.extracted)."
},
"document_url": {
"type": "string",
"description": "https URL to a document. Use this OR document_base64, not both."
},
"document_base64": {
"type": "string",
"description": "Base64-encoded document bytes. Use this OR document_url, not both."
}
}
}No endpoints wrapped at confidence ≥ 0.50.
Parent server
AXIS Toolbox — Agentic Commerce Codebase Intelligence
https://github.com/lastmanupinc-hub/Toolbox
1/7 registries