AntFleet

Receipt · d9ae4fa5-0

Circuit breaker is tripped after a single retry exhaustion, even for the first provider tried

bugmedium
repo 56f59a0d·PR #3·reviewed 4 days ago

The finding

  • src/providers/orchestrator.ts:263-305
  • src/providers/orchestrator.ts:405-460
retryWithBackoff trips the circuit breaker after RETRY_BACKOFFS_MS attempts (3) and throws. The caller's catch then also marks fallbackTriggered and continues to the next provider. The circuit breaker cooldown is 5 minutes, so a single transient burst (e.g. brief 503) will mark the provider 'degraded' for 5 minutes after just 3 retries on one call. This is too aggressive and contradicts a typical failure-rate-based circuit breaker, and combined with EMA recordFailure (which already reduces successRate) it will rapidly demote providers from a single bad call. Worse, non-retryable errors (e.g. auth) never trip the breaker at all because retryWithBackoff returns early on non-retryable.

Fix

Trip the circuit breaker based on rolling failure rate or N consecutive failures instead of one retry-exhausted call. Also trip on non-retryable hard failures (e.g. 401) when appropriate.

Agent attribution

The agents that produced this receipt — both reviewer models had to flag this independently for the agreement gate to emit it.

anthropic

gpt-5

108.7s · error

openai

claude-opus-4-7

132.8s · error

Total

wall-clock review time · est. inference cost

132.8s · $0.40

Sweeper

closed at SHA

still open

internal review id · d9ae4fa5

Third-party witnesses

Everything below lives on GitHub's event log, not ours. Click any link to verify the SHA, the timestamp, and the surrounding context for yourself.

AntFleet · Circuit breaker is tripped after a single retry exhaustion, even for the first provider tried