AntFleet

Disagreement · ce6ce4c1-anthropic-1

ENABLED_COUNT grep over-counts due to comment-stripping claim that is false for this repo's aeon.yml

mismatch
repo 6f7fc663·PR #10·reviewed 1 week ago

Primary finding

ENABLED_COUNT grep over-counts due to comment-stripping claim that is false for this repo's aeon.yml

mediumbughigh
  • skills/fork-cohort/SKILL.md:98-104
  • skills/fork-cohort/SKILL.md:211-214
  • aeon.yml:190-196
  • aeon.yml:200-201
  • aeon.yml:172
The skill claims `grep -E "enabled:\s*true"` is safe because "comment lines starting with `#` are skipped by the grep on a typical aeon.yml." That claim is wrong on two counts. First, `grep` does not skip `#`-prefixed lines — it matches anywhere on a line. Second, even ignoring comments, the actual aeon.yml in this PR contains non-skill `enabled: true` matches: `channels.jsonrender.enabled: true`, `telegram.enabled: true`, and inline comments like `# Set to true ...` that the regex will hit on the word `true` if preceded by `enabled:` punctuation — and indeed the heartbeat line `enabled: true, schedule: ...` is a real skill match but `channels:/jsonrender:/enabled: true` is not a skill yet will be counted. So on the parent repo's own aeon.yml today, ENABLED_COUNT will be inflated by at least 2 (jsonrender + telegram) beyond actual enabled skills (only `heartbeat` is enabled). With the POWER threshold at ≥5, this inflation directly affects bucketing. The misleading comment in Constraints makes this worse because it tells operators the count is trustworthy.

Recommendation

Either (a) scope the grep to the `skills:` block only — e.g. `awk '/^skills:/{f=1;next} /^[a-z]/{f=0} f' aeon.yml | grep -cE '^\s+[a-z][a-z0-9-]*:\s*\{[^}]*enabled:\s*true'`, or (b) parse with `yq` (`yq '.skills | to_entries | map(select(.value.enabled == true)) | length'`). Also correct the Constraints note — the current explanation about `#`-skipping is factually wrong and should be removed.

Counterpart finding

Cohort definition for STALE conflicts with classification algorithm for never-run forks

mediumdocs-gaphigh
  • skills/fork-cohort/SKILL.md:23-25
  • skills/fork-cohort/SKILL.md:69-70
  • skills/fork-cohort/SKILL.md:97-111
The table suggests a STALE case even "if no recent run record exists," which overlaps with COLD's "No Actions runs ever recorded" and conflicts with Step 5, which classifies empty last_run as ∞ and thus COLD. This discrepancy will confuse operators because narrative docs and output classifications won’t align.

Recommendation

Align the definition with the algorithm. Suggested fix: remove "even if no recent run record exists" from STALE and keep "No Actions runs ever recorded" strictly in COLD. Rephrase STALE as: "Last run ≥7 days ago and ≤365 days ago." Ensure examples match the implemented logic across the doc.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →