run_semantic_tests
ActiveTool of IA-QA — 130+ QA & Dev Tools for AI Agents
Semantic assertion primitive: compare actual vs expected text pairs using cosine similarity + ROUGE-L. Two modes: tfidf (default, free, no API key) or embeddings (OpenAI text-embedding-3-small, BYOK, true semantic similarity). Returns per-case PASS/FAIL verdicts and an overall verdict. CI-ready: pipe the JSON verdict field to gate a build.
Parameters schema
{
"type": "object",
"required": [
"cases"
],
"properties": {
"mode": {
"enum": [
"tfidf",
"embeddings"
],
"type": "string",
"description": "tfidf (default): fast, free, lexical. embeddings: OpenAI text-embedding-3-small, true semantic similarity, requires api_key."
},
"cases": {
"type": "array",
"items": {
"type": "object",
"required": [
"actual",
"expected"
],
"properties": {
"id": {
"type": "string",
"description": "Optional identifier for this case."
},
"actual": {
"type": "string",
"description": "The text produced by your LLM/system."
},
"expected": {
"type": "string",
"description": "The reference/ground-truth text."
}
}
},
"maxItems": 50,
"description": "Array of (actual, expected) pairs to evaluate."
},
"api_key": {
"type": "string",
"description": "OpenAI API key — required only when mode is embeddings."
},
"thresholds": {
"type": "object",
"properties": {
"cosine": {
"type": "number",
"maximum": 1,
"minimum": 0,
"description": "Minimum cosine similarity to pass (default: 0.75)."
},
"rouge_l": {
"type": "number",
"maximum": 1,
"minimum": 0,
"description": "Minimum ROUGE-L F1 to pass (default: 0.5)."
}
},
"description": "Pass/fail thresholds (defaults: cosine 0.75, rouge_l 0.5)."
},
"require_all": {
"type": "boolean",
"description": "If true (default), all cases must pass for overall PASS. If false, at least one case passing returns PASS."
}
}
}No endpoints wrapped at confidence ≥ 0.50.
Parent server
IA-QA — 130+ QA & Dev Tools for AI Agents
https://github.com/jcjamet/ia-qa
1/7 registries