tf_harnesses

Active

Tool of io.github.RipperMercs/terminalfeed

declared in 1.1.0

Returns a snapshot of public agentic-coding benchmark scores across SWE-bench Verified, Terminal-Bench, Aider Polyglot, and METR HCAST. Each row pairs a harness with a model. Same model can score very differently on different harnesses; that gap is the value-add. Pass ?view=summary for top 10 combined leaderboard plus biggest harness gaps; ?view=gaps for full per-model harness deltas; ?view=combined for normalized cross-benchmark ranking; ?view=raw (default) for the full benchmark/result graph. Source: hand-curated from upstream leaderboards (swebench.com, terminal-bench.org, aider.chat, metr.org). Cache TTL 12h. Use when the agent needs to recommend a harness/model combo or explain why two agents using the same model perform differently.

Parameters schema

{
  "type": "object",
  "required": [],
  "properties": {
    "view": {
      "enum": [
        "raw",
        "summary",
        "gaps",
        "combined"
      ],
      "type": "string",
      "description": "Output shape; default raw"
    }
  }
}

What this tool wraps· 0 endpoints

min confidence0.70 0.50

No endpoints wrapped at confidence ≥ 0.50.

Parent server

io.github.RipperMercs/terminalfeed

https://github.com/RipperMercs/terminalfeed

1/7 registries

View full server →