Disagreement · 9bb17981-openai-0

Rubric verdict mapping can ACCEPT PRs that fail Scope (non-protected paths)

mismatch

repo 6f7fc663·PR #7·reviewed 1 week ago

Primary finding

Rubric verdict mapping can ACCEPT PRs that fail Scope (non-protected paths)

highbughigh

skills/pr-triage/SKILL.md:104
skills/pr-triage/SKILL.md:111-114

The Scope check says touching scripts/ requires a maintainer (i.e., scope fails), but the OUT-OF-SCOPE verdict only triggers for a narrower subset (workflows, aeon, scripts/prefetch-*, scripts/postprocess-*). If a PR touches other scripts/ paths (or other maintainer-only directories like mcp-server/), it fails Scope yet matches none of OUT-OF-SCOPE, NEEDS-CHANGES, or DEFER, so the final "ACCEPTED otherwise" clause would incorrectly accept it. That contradicts the stated Scope policy and can let risky runtime changes through.

Recommendation

Adjust the verdict mapping so any Scope failure does not fall through to ACCEPTED. Options: - Treat all Scope failures as at least DEFER (needs maintainer review) unless they match the unambiguous protected subset (then OUT-OF-SCOPE). - Or expand the protected-path list in OUT-OF-SCOPE to include all maintainer-only directories stated in the Scope check, and clarify auto-close remains only for workflows/root binary (per §8). Also update "ACCEPTED" to require that Scope passed explicitly.

Counterpart finding

Originality check is unreliable — only inspects skills/ on default branch, missing skills in PR branch or other paths

lowbugmedium

skills/pr-triage/SKILL.md:95-101

The check only lists `/skills` on main once per run. If two PRs in the same run each introduce the same new skill directory, both will pass originality (neither is on main yet). Also, repos may use a default branch other than `main`. Lastly, comparing case-sensitively can miss collisions like `myskill` vs `MySkill` on case-insensitive filesystems.

Recommendation

Use the repo's default branch (`gh api repos/owner/repo --jq .default_branch`), and also collect new-skill names seen earlier in the same run to detect intra-run collisions; do case-insensitive matching.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →

← back to all disagreements view public receipts see unanimous findings + anatomies →

Tweet ↗