run_vlm_test_suite_batch
ActiveTool of IA-QA — 130+ QA & Dev Tools for AI Agents
Compare multiple VLMs on the same test suite in parallel — send an image (URL or base64) + N test cases to all models simultaneously. Returns per-model PASS/FAIL verdicts, pass rates, latency stats, and a comparison table. Assertion types: contains, not_contains, json_format, min_length, max_length, semantic_contains. BYOK: requires API keys for each provider.
Parameters schema
{
"type": "object",
"required": [
"test_cases",
"models",
"api_keys"
],
"properties": {
"models": {
"type": "array",
"items": {
"enum": [
"gpt-4o",
"gpt-4o-mini",
"claude-3-5-sonnet-20241022",
"claude-3-5-haiku-20241022",
"gemini-1.5-flash",
"gemini-2.0-flash"
],
"type": "string"
},
"maxItems": 6,
"minItems": 1,
"description": "Array of model IDs to compare (runs in parallel)."
},
"api_keys": {
"type": "object",
"description": "Map of model ID → API key. Example: { \"gpt-4o\": \"sk-...\", \"claude-3-5-sonnet-20241022\": \"sk-ant-...\" }",
"additionalProperties": {
"type": "string"
}
},
"image_url": {
"type": "string",
"description": "Public URL of the image to evaluate (required unless image_base64 is provided)."
},
"threshold": {
"type": "number",
"description": "Pass rate threshold for overall verdict (default: 80, 0–100)."
},
"test_cases": {
"type": "array",
"items": {
"type": "object",
"required": [
"question"
],
"properties": {
"id": {
"type": "string",
"description": "Optional identifier for this case."
},
"question": {
"type": "string",
"description": "Question to ask the VLM about the image."
},
"assertion_type": {
"enum": [
"contains",
"not_contains",
"json_format",
"min_length",
"max_length",
"semantic_contains"
],
"type": "string",
"description": "Assertion to run on the VLM response."
},
"assertion_value": {
"type": "string",
"description": "Expected value for the assertion (not needed for json_format)."
}
}
},
"maxItems": 10,
"description": "Array of test cases to run against every model."
},
"image_base64": {
"type": "string",
"description": "Base64-encoded image data (required unless image_url is provided)."
},
"system_prompt": {
"type": "string",
"description": "Optional system prompt sent to every VLM."
},
"image_mime_type": {
"type": "string",
"description": "MIME type of the image if using image_base64 (default: image/jpeg)."
}
}
}No endpoints wrapped at confidence ≥ 0.50.
Parent server
IA-QA — 130+ QA & Dev Tools for AI Agents
https://github.com/jcjamet/ia-qa
1/7 registries