Opus finding
isTransientError returns true unconditionally — non-transient errors retried for full backoff cycle
highbughigh
- apps/web/lib/review-worker.ts:268-280
- apps/web/lib/review-worker.ts:246-268
The function lists many transient signals and returns true for each match, but the fallthrough at the end also returns true — making every error 'transient'. The doc comment above the function explicitly contradicts this: 'Anything that looks like our own bug or a 4xx other than 429 is not — retrying won't help and we'd burn LLM budget.' In practice every non-transient error (e.g. validation bugs, 400/401/403/404 from GitHub, schema mismatches, OOM, programmer errors) will now consume the full backoff sequence (~62 minutes wall clock and MAX_PROCESSING_ATTEMPTS = 6 LLM round trips) before reaching the terminal-failed state. This burns LLM budget exactly as the comment warned against, and delays meaningful failure signal in cron logs. The test suite only asserts the 'true' branches (rate limit, 5xx, ECONN*) — there is no test verifying that a non-transient error (e.g. 'HTTP 400 invalid argument' or a TypeError) returns false, so the regression slipped through.
Recommendation
Change the trailing `return true;` to `return false;` so unknown errors are treated as non-retryable, matching the comment. Alternatively, narrow the fallback to a small allowlist (e.g. only retry when message contains an explicit infra signal). Add unit tests for the false branch (HTTP 400/401/403/404, plain TypeError, schema validation message).