run_vlm_test_suite
ActiveTool of IA-QA — 130+ QA & Dev Tools for AI Agents
Run a test suite against a Vision-Language Model (VLM) — send an image (URL or base64) + N test cases (each with a question + assertion) to GPT-4o, Claude 3.5, or Gemini. Returns per-case PASS/FAIL verdicts, a pass rate, an overall PASS/WARNING/FAIL verdict (customizable threshold), and latency stats. Assertion types: contains, not_contains, json_format, min_length, max_length, semantic_contains (TF-IDF cosine similarity ≥ 0.4). BYOK: requires your own API key for the target provider.
Parameters schema
{
"type": "object",
"required": [
"test_cases",
"model",
"api_key"
],
"properties": {
"model": {
"enum": [
"gpt-4o",
"gpt-4o-mini",
"claude-3-5-sonnet-20241022",
"claude-3-5-haiku-20241022",
"gemini-1.5-flash",
"gemini-2.0-flash"
],
"type": "string",
"description": "VLM model to use."
},
"api_key": {
"type": "string",
"description": "API key for the model provider (OpenAI sk-, Anthropic sk-ant-, or Google AIzaSy...)."
},
"image_url": {
"type": "string",
"description": "Public URL of the image to evaluate (required unless image_base64 is provided)."
},
"threshold": {
"type": "number",
"description": "Pass rate threshold for overall verdict (default: 80, 0–100)."
},
"test_cases": {
"type": "array",
"items": {
"type": "object",
"required": [
"question"
],
"properties": {
"id": {
"type": "string",
"description": "Optional identifier for this case."
},
"question": {
"type": "string",
"description": "Question to ask the VLM about the image."
},
"assertion_type": {
"enum": [
"contains",
"not_contains",
"json_format",
"min_length",
"max_length",
"semantic_contains"
],
"type": "string",
"description": "Assertion to run on the VLM response. semantic_contains uses TF-IDF cosine similarity ≥ 0.4."
},
"assertion_value": {
"type": "string",
"description": "Expected value for the assertion (not needed for json_format)."
}
}
},
"maxItems": 10,
"description": "Array of test cases to run."
},
"image_base64": {
"type": "string",
"description": "Base64-encoded image data (required unless image_url is provided)."
},
"system_prompt": {
"type": "string",
"description": "Optional system prompt sent to the VLM."
},
"image_mime_type": {
"type": "string",
"description": "MIME type of the image if using image_base64 (default: image/jpeg)."
}
}
}No endpoints wrapped at confidence ≥ 0.50.
Parent server
IA-QA — 130+ QA & Dev Tools for AI Agents
https://github.com/jcjamet/ia-qa
1/7 registries