Add automatic retry for 429/503 with exponential backoff

This commit is contained in:
Franz Kafka 2026-04-16 19:00:37 +00:00
parent 29292addac
commit 78b3239bbd
3 changed files with 89 additions and 7 deletions

View file

@ -24,11 +24,17 @@ Create a `config.yaml` file in the working directory:
```yaml
port: 8080
upstream_url: "https://api.z.ai/api/anthropic"
# Retry configuration
max_retries: 3
retry_base_delay_ms: 1000
```
- `port`: Port to listen on (default: 8080)
- `upstream_url`: Base URL for the Anthropic-compatible upstream API
- `temperature` (optional): Override temperature for all requests. If set, this value is used instead of client-specified temperatures. Remove this line to respect client temperatures.
- `max_retries`: Maximum retry attempts for transient errors (429, 503). Default: 3. Set to 0 to disable retries.
- `retry_base_delay_ms`: Base delay in milliseconds for exponential backoff. Default: 1000. Delay formula: `base_delay_ms * 2^(attempt-1)` with ±50% jitter.
## Building
@ -184,6 +190,17 @@ The proxy sets these headers on all upstream requests to mimic the claude-code C
| `X-Claude-Code-Session-Id` | Random UUID generated at startup |
| `content-type` | `application/json` |
## Retry Behavior
When the upstream returns a retryable error (HTTP 429 or 503), proxx automatically retries with exponential backoff:
- **Exponential backoff**: Delay doubles on each retry (1s, 2s, 4s, ...)
- **Jitter**: ±50% random variation added to each delay to avoid thundering herd
- **Retryable statuses**: 429 (rate limit), 503 (service unavailable)
- **Logged**: All retry attempts are logged with attempt number, delay, and jitter
This improves resilience against temporary upstream issues without client intervention.
## Security
### Blocked Headers