run_semantic_tests

Active

Tool of IA-QA — 130+ QA & Dev Tools for AI Agents

declared in 1.0.0

Semantic assertion primitive: compare actual vs expected text pairs using cosine similarity + ROUGE-L. Two modes: tfidf (default, free, no API key) or embeddings (OpenAI text-embedding-3-small, BYOK, true semantic similarity). Returns per-case PASS/FAIL verdicts and an overall verdict. CI-ready: pipe the JSON verdict field to gate a build.

Parameters schema

{
  "type": "object",
  "required": [
    "cases"
  ],
  "properties": {
    "mode": {
      "enum": [
        "tfidf",
        "embeddings"
      ],
      "type": "string",
      "description": "tfidf (default): fast, free, lexical. embeddings: OpenAI text-embedding-3-small, true semantic similarity, requires api_key."
    },
    "cases": {
      "type": "array",
      "items": {
        "type": "object",
        "required": [
          "actual",
          "expected"
        ],
        "properties": {
          "id": {
            "type": "string",
            "description": "Optional identifier for this case."
          },
          "actual": {
            "type": "string",
            "description": "The text produced by your LLM/system."
          },
          "expected": {
            "type": "string",
            "description": "The reference/ground-truth text."
          }
        }
      },
      "maxItems": 50,
      "description": "Array of (actual, expected) pairs to evaluate."
    },
    "api_key": {
      "type": "string",
      "description": "OpenAI API key â€” required only when mode is embeddings."
    },
    "thresholds": {
      "type": "object",
      "properties": {
        "cosine": {
          "type": "number",
          "maximum": 1,
          "minimum": 0,
          "description": "Minimum cosine similarity to pass (default: 0.75)."
        },
        "rouge_l": {
          "type": "number",
          "maximum": 1,
          "minimum": 0,
          "description": "Minimum ROUGE-L F1 to pass (default: 0.5)."
        }
      },
      "description": "Pass/fail thresholds (defaults: cosine 0.75, rouge_l 0.5)."
    },
    "require_all": {
      "type": "boolean",
      "description": "If true (default), all cases must pass for overall PASS. If false, at least one case passing returns PASS."
    }
  }
}

What this tool wraps· 0 endpoints

min confidence0.70 0.50

No endpoints wrapped at confidence ≥ 0.70.

Parent server

IA-QA — 130+ QA & Dev Tools for AI Agents

https://github.com/jcjamet/ia-qa

1/7 registries

View full server →