document-qa-prep
ActiveTool of The Stall
Prepares a document for question-answering and RAG pipelines. Chunks the input text at paragraph/sentence boundaries, assigns deterministic chunk IDs, estimates token counts, and extracts document metadata (word count, type, headings). Returns ready-to-embed chunks with overlap support. No LLM or external API — pure text processing. Use mid-task when you've fetched a document and need it split before querying a vector store.
Parameters schema
{
"type": "object",
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"text": {
"type": "string",
"description": "Document text to prepare (plain text, Markdown, or lightly-structured prose). Max 500,000 chars."
},
"metadata": {
"type": "string",
"description": "Optional key-value metadata to attach to every chunk (e.g. source URL, document ID)."
},
"overlap_tokens": {
"type": "integer",
"description": "Token overlap between consecutive chunks for context continuity (default 50, max 512)."
},
"chunk_size_tokens": {
"type": "integer",
"description": "Target chunk size in tokens (default 512, max 4096). Uses 4-char-per-token estimate."
}
},
"additionalProperties": false
}Parent server
The Stall
1/7 registries