Description
Transient failures (connection resets, 502/503/504 responses) should be automatically retried with exponential backoff and random jitter rather than immediately surfacing to the caller. This makes the SDK resilient to brief network hiccups.
Proposed Steps
- Implement a
_retry_with_backoff(fn, max_retries, base_delay) utility.
- Retry on:
httpx.ConnectError, httpx.TimeoutException, HTTP 502/503/504.
- Delay formula:
min(base_delay * 2**attempt + random.uniform(0, 0.5), 60).
- Do NOT retry on 4xx client errors (except 429, handled separately).
- Log each retry attempt at DEBUG level with attempt number and delay.
Acceptance Criteria
- A request that fails twice then succeeds is returned successfully after the retries.
- A 400 response is NOT retried — it raises
InvalidRequestError immediately.
- Retries respect
max_retries; exceeding it raises NetworkError.
- The delay between retries grows with each attempt.
- Retry behaviour is testable by injecting a mock transport.
Description
Transient failures (connection resets, 502/503/504 responses) should be automatically retried with exponential backoff and random jitter rather than immediately surfacing to the caller. This makes the SDK resilient to brief network hiccups.
Proposed Steps
_retry_with_backoff(fn, max_retries, base_delay)utility.httpx.ConnectError,httpx.TimeoutException, HTTP 502/503/504.min(base_delay * 2**attempt + random.uniform(0, 0.5), 60).Acceptance Criteria
InvalidRequestErrorimmediately.max_retries; exceeding it raisesNetworkError.