Primary finding
Circuit breaker is tripped after a single retry exhaustion, even for the first provider tried
- src/providers/orchestrator.ts:263-305
- src/providers/orchestrator.ts:405-460
retryWithBackoff trips the circuit breaker after RETRY_BACKOFFS_MS attempts (3) and throws. The caller's catch then also marks fallbackTriggered and continues to the next provider. The circuit breaker cooldown is 5 minutes, so a single transient burst (e.g. brief 503) will mark the provider 'degraded' for 5 minutes after just 3 retries on one call. This is too aggressive and contradicts a typical failure-rate-based circuit breaker, and combined with EMA recordFailure (which already reduces successRate) it will rapidly demote providers from a single bad call. Worse, non-retryable errors (e.g. auth) never trip the breaker at all because retryWithBackoff returns early on non-retryable.
Recommendation
Trip the circuit breaker based on rolling failure rate or N consecutive failures instead of one retry-exhausted call. Also trip on non-retryable hard failures (e.g. 401) when appropriate.