similarity_score
ActiveTool of IA-QA — 130+ QA & Dev Tools for AI Agents
Compute text similarity between reference and hypothesis using multiple metrics: Cosine (BoW, TF-IDF), Jaccard, ROUGE-1, ROUGE-2, ROUGE-L, and BLEU. No API key needed. Ideal for LLM eval (expected vs actual), RAG quality checks, and NLG benchmarking. Supports batch mode.
Parameters schema
{
"type": "object",
"properties": {
"batch": {
"type": "array",
"items": {
"type": "object",
"required": [
"reference",
"hypothesis"
],
"properties": {
"reference": {
"type": "string"
},
"hypothesis": {
"type": "string"
}
}
},
"description": "Batch mode: array of {reference, hypothesis} pairs."
},
"metrics": {
"type": "array",
"items": {
"type": "string"
},
"description": "Metrics to compute (default: all). Options: \"cosine_bow\", \"cosine_tfidf\", \"jaccard\", \"rouge1\", \"rouge2\", \"rougeL\", \"bleu\""
},
"reference": {
"type": "string",
"description": "Reference / expected text (ground truth)"
},
"threshold": {
"type": "number",
"description": "Optional pass/fail threshold (0-1). Applies to ROUGE-L F1 score."
},
"hypothesis": {
"type": "string",
"description": "Hypothesis / actual text (LLM output)"
}
}
}No endpoints wrapped at confidence ≥ 0.50.
Parent server
IA-QA — 130+ QA & Dev Tools for AI Agents
https://github.com/jcjamet/ia-qa
1/7 registries