vision-analyze
ActiveTool of The Stall
Analyze any image URL using GPT-4o-mini vision. Returns structured analysis based on the mode: describe (full description), ocr (text extraction), chart (data/trend extraction), ui (interface analysis), identify (object/subject ID), or qa (answer a specific question about the image). Input must be a publicly accessible image URL (JPEG, PNG, GIF, WebP). $0.050/call.
Parameters schema
{
"type": "object",
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"url": {
"type": "string",
"description": "Publicly accessible URL of the image to analyze. Must return image/jpeg, image/png, image/gif, or image/webp content-type. Max file size: 20MB."
},
"mode": {
"type": "string",
"description": "Analysis mode: describe (full scene description), ocr (text extraction), chart (data/chart analysis), ui (UI screenshot analysis), identify (object/subject identification), qa (answer a specific question about the image — requires the 'question' parameter)."
},
"detail": {
"type": "string",
"description": "OpenAI vision detail level. 'auto' (default): model decides based on image size. 'low': faster, cheaper, less detail (best for simple images). 'high': slower, more detail (best for charts, dense text, complex scenes)."
},
"question": {
"type": "string",
"description": "For mode=qa only: the specific question to answer about the image. E.g., 'What is the total revenue shown in Q3?' or 'What does the error message say?'"
}
},
"additionalProperties": false
}No endpoints wrapped at confidence ≥ 0.70.
Parent server
The Stall
1/7 registries