fix(agent): backoff long on Anthropic 529 Overloaded, don't UNHEALTHY-cascade #14
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/api-overloaded-backoff"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Probleem
Claude
API Error: Overloaded(= Anthropic HTTP 529 server-capacity, ≠ HTTP 429 rate-limit) zorgde voor cascade:Live bevestigd 2026-05-27 13:05 CEST tijdens sprint sub-agent dispatch (Sonnet 4.6 piek-belasting).
Fix in twee bestanden
bin/run-one-job.tsAPI_OVERLOAD_PATTERNS(matchtAPI Error: Overloaded+"status_code":529)apiOverloaded = truerollbackClaimmetreason=api_overloadedvoor traceability (cleanup-effect via mcp PR #19)bin/run-agent.shAGENT_OVERLOAD_SLEEP(default 300s)api-overloaded,continuezonder CONSEC_FAILURES++ of BACKOFF-progressionVerificatie post-merge + deploy
update_mcp_workerop scrum4me-srv[run-one-job] API_OVERLOADED detected+ run-agent.sh logAPI OVERLOADED (exit=4) — sleeping 300sclaude_jobs.status='QUEUED', schone state, retry-baarNiet in scope
claudeCLI exposes alleen exit + stdout). Pattern-match op stdout is voldoende voor het overgrote deel.Rollback
Revert commit +
update_mcp_workerflow → terug naar generic-exit-1 path. Geen state-impact.Dependency
Scrum4me-mcp PR #19 (rollbackClaim cleanup) is co-dependent: zonder dat blijven worktree + sprint_task_executions hangen ook bij overload-rollback → retry na 300s sleep zou alsnog falen op stale-state. Beide deployen via één
update_mcp_workerrebuild (cache-bust trekt latest scrum4me-mcp main + bundelt scrum4me-docker bin/* updates).