Receipt · d9ae4fa5-0
Circuit breaker is tripped after a single retry exhaustion, even for the first provider tried
bugmedium
repo 56f59a0d·PR #3·reviewed 4 days ago
The finding
- src/providers/orchestrator.ts:263-305
- src/providers/orchestrator.ts:405-460
retryWithBackoff trips the circuit breaker after RETRY_BACKOFFS_MS attempts (3) and throws. The caller's catch then also marks fallbackTriggered and continues to the next provider. The circuit breaker cooldown is 5 minutes, so a single transient burst (e.g. brief 503) will mark the provider 'degraded' for 5 minutes after just 3 retries on one call. This is too aggressive and contradicts a typical failure-rate-based circuit breaker, and combined with EMA recordFailure (which already reduces successRate) it will rapidly demote providers from a single bad call. Worse, non-retryable errors (e.g. auth) never trip the breaker at all because retryWithBackoff returns early on non-retryable.
Fix
Trip the circuit breaker based on rolling failure rate or N consecutive failures instead of one retry-exhausted call. Also trip on non-retryable hard failures (e.g. 401) when appropriate.
Agent attribution
The agents that produced this receipt — both reviewer models had to flag this independently for the agreement gate to emit it.
anthropic
gpt-5
108.7s · error
openai
claude-opus-4-7
132.8s · error
Total
wall-clock review time · est. inference cost
132.8s · $0.40
Sweeper
closed at SHA
still open
internal review id · d9ae4fa5
Third-party witnesses
Everything below lives on GitHub's event log, not ours. Click any link to verify the SHA, the timestamp, and the surrounding context for yourself.
Original review comment
https://github.com/AntFleet/bench-mythos-router/pull/3#issuecomment-4540078355