Add automatic retry for 429/503 with exponential backoff

2026-04-16 19:00:37 +00:00 · 2026-04-16 19:00:37 +00:00 · 78b3239bbd
commit 78b3239bbd
parent 29292addac
3 changed files with 89 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -24,11 +24,17 @@ Create a `config.yaml` file in the working directory:
 ```yaml
 port: 8080
 upstream_url: "https://api.z.ai/api/anthropic"
+
+# Retry configuration
+max_retries: 3
+retry_base_delay_ms: 1000
 ```

 - `port`: Port to listen on (default: 8080)
 - `upstream_url`: Base URL for the Anthropic-compatible upstream API
 - `temperature` (optional): Override temperature for all requests. If set, this value is used instead of client-specified temperatures. Remove this line to respect client temperatures.
+- `max_retries`: Maximum retry attempts for transient errors (429, 503). Default: 3. Set to 0 to disable retries.
+- `retry_base_delay_ms`: Base delay in milliseconds for exponential backoff. Default: 1000. Delay formula: `base_delay_ms * 2^(attempt-1)` with ±50% jitter.

 ## Building

@ -184,6 +190,17 @@ The proxy sets these headers on all upstream requests to mimic the claude-code C
 | `X-Claude-Code-Session-Id` | Random UUID generated at startup |
 | `content-type` | `application/json` |

+## Retry Behavior
+
+When the upstream returns a retryable error (HTTP 429 or 503), proxx automatically retries with exponential backoff:
+
+- **Exponential backoff**: Delay doubles on each retry (1s, 2s, 4s, ...)
+- **Jitter**: ±50% random variation added to each delay to avoid thundering herd
+- **Retryable statuses**: 429 (rate limit), 503 (service unavailable)
+- **Logged**: All retry attempts are logged with attempt number, delay, and jitter
+
+This improves resilience against temporary upstream issues without client intervention.
+
 ## Security

 ### Blocked Headers