AntFleet

Disagreement · 1ea5c6c4-openai-0

PR list does not filter to last 24h as described

mismatch
repo 6f7fc663·PR #13·reviewed 1 week ago

Primary finding

PR list does not filter to last 24h as described

mediumbughigh
  • templates/code-reviewer/SKILL.md:14
  • templates/code-reviewer/SKILL.md:20-21
The narrative specifies selecting PRs opened in the last 24h, but the provided gh/jq command lists all open PRs and never filters by createdAt. This creates a behavior/docs mismatch and will process older PRs.

Recommendation

Add a 24h filter. For example, filter by createdAt in jq: --jq '.[] | select((now - (.createdAt|fromdateiso8601)) < 24*60*60) | select(.author.login != "github-actions[bot]" and .author.login != "aeonframework") | .number'. Alternatively, precompute a SINCE date and use --search "created:>=$(date -u -d '24 hours ago' +%Y-%m-%d)" if day-level precision is acceptable.

Counterpart finding

code-reviewer PR listing filters in --json but selects by author after fetch; tail --jq path will include already-reviewed PRs

lowmaintainabilitymedium
  • templates/code-reviewer/SKILL.md:18-26
Step 1 says 'every open PR opened in the last 24h that hasn't been reviewed by this skill yet', but the gh pr list command does not filter by createdAt (no 24h window) and does not filter against the reviewed.json state file. The dedup against reviewed.json is described only in prose ('Skip anything already in there'). The 24h window mentioned in the comment is never actually applied — createdAt is fetched but never compared. This is a deceptive doc/comment vs. behavior: an operator copying this template gets a SKILL.md whose stated semantics (last 24h) are not implemented by the shown command.

Recommendation

Either add a --jq filter on createdAt (e.g. select((now - (.createdAt | fromdateiso8601)) < 86400)) or drop the '24h' wording. Also explicitly show the read of reviewed.json before the loop.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →

From the same review

These findings passed the unanimous gate on the same PR review. The disagreement above was filtered out; the findings below were posted.