jailbreak_attempt_detector
ActiveTool of gapup-mcp
Detects potential LLM jailbreak attempts by analyzing user input against NIST AI Risk Management Framework adversarial patterns. Designed for persona risk assessment, this tool evaluates text for common jailbreak techniques such as prompt injection, role-playing, or obfuscation. Inputs include the user message and optional context, returning a risk assessment with confidence scores and pattern matches. Ideal for real-time moderation in chat applications or API gateways.
Parameters schema
{
"type": "object",
"required": [
"message"
],
"properties": {
"async": {
"type": "boolean",
"description": "If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts."
},
"context": {
"type": "string",
"description": "Optional conversation context for better pattern matching"
},
"message": {
"type": "string",
"description": "User input text to analyze for jailbreak attempts"
},
"threshold": {
"type": "number",
"default": 0.7,
"maximum": 1,
"minimum": 0,
"description": "Confidence threshold for flagging attempts"
}
}
}No endpoints wrapped at confidence ≥ 0.50.
Parent server
gapup-mcp
https://github.com/getgapup/gapup-mcp-public
2/7 registries