Public benchmarks · updated live
Every two-model review AntFleet ran on a benchmark-class repo.
61
benchmarks and counting
updated 17 hours ago
Benchmark-class repos are public repos with a BENCHMARK.mdfile at the root. PRs there are not meant to merge — they exist to run a known diff past AntFleet's two-model unanimous consensus and publish the result. Click any row to read the bot review on GitHub.
Looking for closed-finding receipts instead? /receipts.
Older benchmarks
showing 11 of 61- 4 findings7 filesreview →
AntFleet/aeon-bench · PR #4
gpt-5claude-opus-4-7·commit 0738a2c·2 days ago - 2 findings1 filereview →
AntFleet/aeon-bench · PR #3
gpt-5claude-opus-4-7·commit cefb87b·2 days ago - 2 findings3 filesreview →
AntFleet/aeon-bench · PR #2
gpt-5claude-opus-4-7·commit d95bf91·2 days ago - 2 findings1 filereview →
AntFleet/aeon-bench · PR #1
gpt-5claude-opus-4-7·commit 3719764·2 days ago - 1 finding1 filereview →
AntFleet/agent-autonomopoly-bench · PR #4
gpt-5claude-opus-4-7·commit b82a742·2 days ago - 1 finding1 filereview →
AntFleet/agent-autonomopoly-bench · PR #3
gpt-5claude-opus-4-7·commit aec4324·2 days ago - 2 findings5 filesreview →
AntFleet/agent-autonomopoly-bench · PR #2
gpt-5claude-opus-4-7·commit 212c926·2 days ago - 0 findings (clean)1 filePR →
AntFleet/agent-autonomopoly-bench · PR #4
gpt-5claude-opus-4-7·commit 5cfae36·2 days ago - 0 findings (clean)PR →
AntFleet/agent-autonomopoly-bench · PR #3
commit 113c897·2 days ago - 0 findings (clean)1 filePR →
AntFleet/agent-autonomopoly-bench · PR #2
gpt-5claude-opus-4-7·commit 72cbe83·2 days ago - 0 findings (clean)PR →
AntFleet/agent-autonomopoly-bench · PR #1
commit 7747177·2 days ago