iliad_speech_to_text
ActiveTool of AXIS Toolbox — Agentic Commerce Codebase Intelligence
AXIS-owned audio transcription via whisper.cpp + ffmpeg-static. Accepts either `audio_url` (https URL we fetch, max 100 MiB, 60s download timeout) or `audio_base64` (inline bytes, max 100 MiB decoded) — exactly one. Accepts any audio format ffmpeg can decode (mp3, wav, m4a, opus, ogg, flac); we resample to 16 kHz mono WAV internally. Optional `language` (ISO-639-1 like "en" / "fr" / "ja", or "auto" — default). Optional `initial_prompt` (≤512 chars; biases spelling of rare names). Optional `word_timestamps` boolean. Returns `{text, segments: [{start, end, text}], language_detected, duration_seconds, model_used}`. When operator hasn't installed whisper-cli or placed the GGML model file at AXIS_WHISPER_MODEL_PATH (default `models/ggml-base.en.bin`), returns `{_not_configured: true, reason, detail, remediation}`. Engineer mode (X-Agent-Mode: engineer — Diarization, $0.10): the response adds `diarization` — speaker turns grouped from the segments by inter-segment pause gaps (tune with diarization_gap_seconds / max_speakers; this is pause-based turn segmentation, not acoustic speaker ID). Requires Authorization: Bearer <api_key>.
Parameters schema
{
"type": "object",
"properties": {
"language": {
"type": "string",
"description": "ISO-639-1 language code (en, fr, ja, ...) or 'auto' to autodetect. Defaults 'auto'."
},
"audio_url": {
"type": "string",
"description": "https URL to an audio file. Use this OR audio_base64, not both."
},
"audio_base64": {
"type": "string",
"description": "Base64-encoded audio bytes. Use this OR audio_url, not both."
},
"max_speakers": {
"type": "number",
"description": "Engineer mode: max alternating speaker labels. Defaults 2."
},
"initial_prompt": {
"type": "string",
"description": "Optional bias prompt (≤512 chars) — useful for spelling of rare names."
},
"word_timestamps": {
"type": "boolean",
"description": "Emit word-level timestamps within segments. Defaults false."
},
"diarization_gap_seconds": {
"type": "number",
"description": "Engineer mode: pause (seconds) between segments that starts a new speaker turn. Defaults 0.75."
}
}
}No endpoints wrapped at confidence ≥ 0.70.
Parent server
AXIS Toolbox — Agentic Commerce Codebase Intelligence
https://github.com/lastmanupinc-hub/Toolbox
1/7 registries