AntFleet

Disagreement · cdf9ffa0-anthropic-7

ADMANAGE_API_KEY missing is documented as 'Hard-fails' but script actually exits 0

mismatch
repo 6f7fc663·PR #4·reviewed 1 week ago

Primary finding

ADMANAGE_API_KEY missing is documented as 'Hard-fails' but script actually exits 0

mediumdocs-gaphigh
  • scripts/postprocess-admanage.sh:18
  • scripts/postprocess-admanage.sh:46-50
  • scripts/postprocess-admanage-create.sh:22
  • scripts/postprocess-admanage-create.sh:49-54
The header comments in both scripts assert a hard-fail on missing key, but both scripts actually emit a warning and `exit 0` — i.e. soft-fail. This is a deceptive comment that an auditor will rely on when assessing safety posture. The behavior may even be intentional (so CI doesn't fail when key isn't injected for forks/PRs), but the docs must match the code.

Recommendation

Either change the header to 'Soft-fails (warns + notifies) if ADMANAGE_API_KEY is not set' or change the exit to `exit 1` to actually hard-fail.

Counterpart finding

Daily spend cap circuit breaker fails open if spend API returns invalid/empty JSON

mediumbughigh
  • scripts/postprocess-admanage.sh:55-69
If SPEND_RESP is non-JSON (or jq fails), TODAY_SPEND becomes empty. The AWK comparison then evaluates an invalid expression (" >= <cap>") and returns non-zero, which makes the if not trigger. The script proceeds to launch despite an unknown/possibly over-cap spend state. A circuit breaker should fail closed for safety.

Recommendation

Harden parsing: default TODAY_SPEND to a safe numeric value and fail closed on parse errors. Example: parsed=$(echo "$SPEND_RESP" | jq -er '.metadata.totalSpend' 2>/dev/null || echo '__ERR__'); if [ "$parsed" = '__ERR__' ]; then block launches with a warning; else compare numerically using bc or awk with explicit numbers. Alternatively, treat any fetch/parse failure as over-cap and exit.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →