tf_harnesses
ActiveTool of io.github.RipperMercs/terminalfeed
Returns a snapshot of public agentic-coding benchmark scores across SWE-bench Verified, Terminal-Bench, Aider Polyglot, and METR HCAST. Each row pairs a harness with a model. Same model can score very differently on different harnesses; that gap is the value-add. Pass ?view=summary for top 10 combined leaderboard plus biggest harness gaps; ?view=gaps for full per-model harness deltas; ?view=combined for normalized cross-benchmark ranking; ?view=raw (default) for the full benchmark/result graph. Source: hand-curated from upstream leaderboards (swebench.com, terminal-bench.org, aider.chat, metr.org). Cache TTL 12h. Use when the agent needs to recommend a harness/model combo or explain why two agents using the same model perform differently.
Parameters schema
{
"type": "object",
"required": [],
"properties": {
"view": {
"enum": [
"raw",
"summary",
"gaps",
"combined"
],
"type": "string",
"description": "Output shape; default raw"
}
}
}No endpoints wrapped at confidence ≥ 0.70.
Parent server
io.github.RipperMercs/terminalfeed
https://github.com/RipperMercs/terminalfeed
1/7 registries