You're viewing a demo portfolio

Join the waitlist
PRSM

compare_responses

Active

Tool of IA-QA — 130+ QA & Dev Tools for AI Agents

declared in 1.0.0

Compare two ALREADY-PRODUCED outputs (e.g. model A vs model B on the same task) side by side. Returns deterministic metrics (token cosine, ROUGE-L, Jaccard, length/structure deltas, JSON diff) and a verdict. If a `reference` (ground truth) is given, scores each output against it and picks the closer one. If `model` + `api_key` are given, an LLM judge also picks a qualitative winner for the task. No re-execution — you bring the outputs.

Parameters schema

{
  "type": "object",
  "required": [
    "response_a",
    "response_b"
  ],
  "properties": {
    "task": {
      "type": "string",
      "description": "The task/prompt both outputs were answering — used by the LLM judge for context"
    },
    "model": {
      "type": "string",
      "description": "Optional judge model id (BYOK). When set with api_key, an LLM judge picks a qualitative winner."
    },
    "api_key": {
      "type": "string",
      "description": "Optional API key for the judge model (BYOK). Used only for the judge call; never stored."
    },
    "label_a": {
      "type": "string",
      "description": "Label for output A (e.g. \"GPT-4o\", \"v1.0\")"
    },
    "label_b": {
      "type": "string",
      "description": "Label for output B (e.g. \"GPT-5-nano\", \"v1.1\")"
    },
    "reference": {
      "type": "string",
      "description": "Optional ground-truth / expected answer. If set, each output is scored against it and the closer one wins (deterministic)."
    },
    "check_json": {
      "type": "boolean",
      "description": "Try to parse as JSON and compare structurally (keys, types, values)"
    },
    "response_a": {
      "type": "string",
      "description": "First output (e.g. model A's answer)"
    },
    "response_b": {
      "type": "string",
      "description": "Second output (e.g. model B's answer)"
    }
  }
}

What this tool wraps· 0 endpoints

min confidence0.700.50

No endpoints wrapped at confidence ≥ 0.70.

Parent server

IA-QA — 130+ QA & Dev Tools for AI Agents

https://github.com/jcjamet/ia-qa

1/7 registries
View full server →
compare_responses — IA-QA — 130+ QA & Dev Tools for AI Agents — PRSM MCP