document-qa-prep

Active

declared in 4.82.0

Prepares a document for question-answering and RAG pipelines. Chunks the input text at paragraph/sentence boundaries, assigns deterministic chunk IDs, estimates token counts, and extracts document metadata (word count, type, headings). Returns ready-to-embed chunks with overlap support. No LLM or external API — pure text processing. Use mid-task when you've fetched a document and need it split before querying a vector store.

Parameters schema

{
  "type": "object",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "properties": {
    "text": {
      "type": "string",
      "description": "Document text to prepare (plain text, Markdown, or lightly-structured prose). Max 500,000 chars."
    },
    "metadata": {
      "type": "string",
      "description": "Optional key-value metadata to attach to every chunk (e.g. source URL, document ID)."
    },
    "overlap_tokens": {
      "type": "integer",
      "description": "Token overlap between consecutive chunks for context continuity (default 50, max 512)."
    },
    "chunk_size_tokens": {
      "type": "integer",
      "description": "Target chunk size in tokens (default 512, max 4096). Uses 4-char-per-token estimate."
    }
  },
  "additionalProperties": false
}

What this tool wraps· 1 endpoint

min confidence0.70 0.50

Data resourceYouTube Data API
Google · media
name mention0.60

Parent server

The Stall

1/7 registries

View full server →