Why
Droid/OpenCode failures are often upstream/model-specific rather than local proxy failures. The proxy should classify and react to repeated 429, 5xx, and timeout signals per model.
Scope
- Background health poller with safe low-frequency checks.
- Per-model state: available, limited, retry, error, untested.
- Temporary circuit breaker with reset timers.
- Dashboard and
proxy-status should expose breaker state.
Acceptance criteria
- A model with repeated transient failures is temporarily removed from round-robin.
- Reset/recovery is visible in
/metrics.
- Tests cover state transitions without real network calls.
Why
Droid/OpenCode failures are often upstream/model-specific rather than local proxy failures. The proxy should classify and react to repeated 429, 5xx, and timeout signals per model.
Scope
proxy-statusshould expose breaker state.Acceptance criteria
/metrics.