You're viewing a demo portfolio

Join the waitlist
PRSM

iliad_speech_to_text

Active

Tool of AXIS Toolbox — Agentic Commerce Codebase Intelligence

declared in 0.5.3

AXIS-owned audio transcription via whisper.cpp + ffmpeg-static. Accepts either `audio_url` (https URL we fetch, max 100 MiB, 60s download timeout) or `audio_base64` (inline bytes, max 100 MiB decoded) — exactly one. Accepts any audio format ffmpeg can decode (mp3, wav, m4a, opus, ogg, flac); we resample to 16 kHz mono WAV internally. Optional `language` (ISO-639-1 like "en" / "fr" / "ja", or "auto" — default). Optional `initial_prompt` (≤512 chars; biases spelling of rare names). Optional `word_timestamps` boolean. Returns `{text, segments: [{start, end, text}], language_detected, duration_seconds, model_used}`. When operator hasn't installed whisper-cli or placed the GGML model file at AXIS_WHISPER_MODEL_PATH (default `models/ggml-base.en.bin`), returns `{_not_configured: true, reason, detail, remediation}`. Engineer mode (X-Agent-Mode: engineer — Diarization, $0.10): the response adds `diarization` — speaker turns grouped from the segments by inter-segment pause gaps (tune with diarization_gap_seconds / max_speakers; this is pause-based turn segmentation, not acoustic speaker ID). Requires Authorization: Bearer <api_key>.

Parameters schema

{
  "type": "object",
  "properties": {
    "language": {
      "type": "string",
      "description": "ISO-639-1 language code (en, fr, ja, ...) or 'auto' to autodetect. Defaults 'auto'."
    },
    "audio_url": {
      "type": "string",
      "description": "https URL to an audio file. Use this OR audio_base64, not both."
    },
    "audio_base64": {
      "type": "string",
      "description": "Base64-encoded audio bytes. Use this OR audio_url, not both."
    },
    "max_speakers": {
      "type": "number",
      "description": "Engineer mode: max alternating speaker labels. Defaults 2."
    },
    "initial_prompt": {
      "type": "string",
      "description": "Optional bias prompt (≤512 chars) — useful for spelling of rare names."
    },
    "word_timestamps": {
      "type": "boolean",
      "description": "Emit word-level timestamps within segments. Defaults false."
    },
    "diarization_gap_seconds": {
      "type": "number",
      "description": "Engineer mode: pause (seconds) between segments that starts a new speaker turn. Defaults 0.75."
    }
  }
}

What this tool wraps· 0 endpoints

min confidence0.700.50

No endpoints wrapped at confidence ≥ 0.70.

Parent server

AXIS Toolbox — Agentic Commerce Codebase Intelligence

https://github.com/lastmanupinc-hub/Toolbox

1/7 registries
View full server →
iliad_speech_to_text — AXIS Toolbox — Agentic Commerce Codebase Intelligence — PRSM MCP