AntFleet

Disagreement · 4ff82c7b-anthropic-0

isTransientError unconditionally returns true, defeating non-retryable classification

solo Opus
repo e24ef98c·PR #11·reviewed 1 week ago

Opus finding

isTransientError unconditionally returns true, defeating non-retryable classification

highbughigh
  • apps/web/lib/review-worker.ts:271-282
  • apps/web/lib/review-worker.ts:175-200
The function's documented contract (see comments above its body and the existing tests) is to distinguish transient infra errors (429, 5xx, network) from non-transient bugs/4xx so the latter are terminally failed immediately rather than retried 6 times burning LLM budget. However, the final fallback `return true;` makes every error retryable. The earlier string-matching branches are therefore dead — any error that didn't already match still returns true. As a result a real bug (e.g. `TypeError: foo.bar is undefined`, a 401/403/404 from GitHub, a schema validation error from the model) will be retried up to MAX_PROCESSING_ATTEMPTS (6) times across ~62 minutes, generating ~6× LLM cost per broken PR and delaying the row reaching `failed` for an hour. The tests in review-worker.test.ts only assert the positive branch (`isTransientError(...)` returns true for 429/5xx/network); they never assert that, e.g., a 401 or programmer error is non-transient, so the bug is invisible to the suite.

Recommendation

Change the final `return true;` to `return false;` so only matched transient signatures are retryable, and add a negative test (`isTransientError(new Error('TypeError ...'))` → false, `isTransientError(new Error('HTTP 401'))` → false).

Other reviewer

The other reviewer flagged nothing in this file/line range.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →