Disagreement · e488cbca-anthropic-0

Scanner scans itself and self-flags HIGH due to pattern strings, producing false positives on --all

mismatch

repo 6f7fc663·PR #29·reviewed 1 week ago

Primary finding

Scanner scans itself and self-flags HIGH due to pattern strings, producing false positives on --all

mediumbughigh

skills/skill-security-scan/scan.sh:79-85
skills/skill-security-scan/SKILL.md:1-60

When run with --all, scan.sh finds skills/skill-security-scan/SKILL.md and scans it. That SKILL.md contains literal strings like "ignore previous instructions", "you are now...", "rm -rf", "git push --force", and "curl/wget" exfiltration discussions which match HIGH and MEDIUM patterns (e.g. '[Ii]gnore\s+(all\s+)?previous\s+instructions', '[Yy]ou\s+are\s+now\s+', 'rm\s+-rf\s+\*', 'git\s+push\s+--force'). As a result, the skill that defines the scan will FAIL its own scan, causing the orchestrator (per SKILL.md step 6) to notify and exit 1 even when nothing is wrong. There is no allowlist/self-skip and no trusted-source filtering actually applied in scan.sh (TRUSTED_OWNERS/TRUSTED_REPOS are loaded but never consulted).

Recommendation

Either (a) exclude the security-scan skill from --all by default, (b) treat fenced-code/threat-model sections in SKILL.md differently, or (c) actually consult TRUSTED_OWNERS/TRUSTED_REPOS to downgrade self-scan / known sources to format validation as the SKILL.md step 3 promises.

Counterpart finding

Trusted sources policy documented but not enforced; dead arrays for TRUSTED_OWNERS/REPOS

lowdocs-gaphigh

skills/skill-security-scan/SKILL.md:34-35
skills/skill-security-scan/scan.sh:60-74

The SKILL.md states trusted sources receive reduced scanning and describes source detection, but the script only loads the trusted list and never uses it to alter scanning behavior. This is confusing and leaves unused code paths.

Recommendation

Either implement the trusted-source policy (e.g., skip content regex checks for files whose origin matches TRUSTED_* and only validate frontmatter/format) or remove/adjust the documentation and dead code until implemented. Add origin detection (frontmatter 'origin' or git remote) and unit tests.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →

From the same review

These findings passed the unanimous gate on the same PR review. The disagreement above was filtered out; the findings below were posted.

← back to all disagreements view public receipts see unanimous findings + anatomies →

Tweet ↗