Adds a 7-file English-language manual targeted at new human contributors: index, overview, statuses & transitions (with mermaid state diagrams), git workflow, MCP integration, docker, and troubleshooting. The manual is the *map* — it cross-references existing runbooks/ADRs/architecture docs rather than duplicating their content. Regenerates docs/INDEX.md and validates with check-doc-links.mjs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.7 KiB
| title | status | audience | language | last_updated | when_to_read | |
|---|---|---|---|---|---|---|
| Docker | active |
|
en | 2026-05-07 | Before running the worker locally, debugging a stuck job, or operating the Mac/NAS deployment. |
05 — Docker
This chapter is the contributor's tour of the Docker side of Scrum4Me. Two important up-front facts:
- The Next.js app is not containerised. The web UI, API routes, server actions, and database connection all run on Vercel (serverless functions + Edge runtime). There is no
Dockerfilein this repo and nodocker-compose.yml. - Only the worker is containerised. The "worker" is a Claude Code agent in a long-running container that polls the Scrum4Me job queue via MCP and executes
TASK_IMPLEMENTATION/IDEA_GRILL/IDEA_MAKE_PLAN/SPRINT_IMPLEMENTATIONjobs.
The container image and its supporting scripts live in a separate repo: madhura68/scrum4me-docker. This manual documents the consumer side — what the worker is, how it relates to Scrum4Me, and how to diagnose issues. The container internals (Dockerfile, entrypoint, agent provisioning) are out of scope for this manual; see that repo's README.
Note: A separate sandbox repo
scrum4me-sbx(SC-4) exists for Docker exploration. Treat it as a scratchpad, not as the production worker.
Topology
flowchart LR
subgraph Vercel
App[Next.js app<br/>+ API routes]
end
subgraph Neon
DB[(Postgres)]
end
subgraph Mac["Mac (default) / NAS (opt-in)"]
Worker[Worker container<br/>Claude Code + MCP]
end
Worker -- MCP over HTTPS --> App
App -- Prisma --> DB
Worker -- git push --> GH[GitHub]
GH -- webhooks --> App
- The worker never connects to the database directly. All state changes go through MCP tools, which call the Vercel-hosted REST API, which writes to Neon via Prisma.
- The worker does push commits directly to GitHub. GitHub then notifies Vercel and the auto-PR flow (03 — Git Workflow) takes over.
Mac vs NAS
| Flow | When to use | Status |
|---|---|---|
| Mac-native (arm64) | Default for development and small teams | Active |
| NAS | Self-hosted always-on worker on a Synology / Asustor / similar | Opt-in, validated by historical smoke tests in docs/docker-smoke/ |
The Mac flow is the default because it doesn't require dedicated hardware. The container runs natively on Apple Silicon (arm64) — no x86 emulation overhead.
Environment variables the worker needs
The worker container needs only what's required to authenticate to MCP and push to GitHub:
| Var | Purpose |
|---|---|
SCRUM4ME_BEARER_TOKEN |
Bearer token bound to a product. Returned by the user's API-token settings page. |
SCRUM4ME_BASE_URL |
Usually https://scrum4me.vercel.app (or the user's domain). |
GITHUB_TOKEN |
Personal access token with repo scope, used by git push and gh pr create. |
ANTHROPIC_API_KEY |
The Claude API key used by the worker process. |
MIN_QUOTA_PCT |
Optional. Worker pauses if Anthropic quota drops below this percentage. |
Hardstop: the worker does not need
DATABASE_URL,SESSION_SECRET, orCRON_SECRET. Those belong to the Next.js app; the worker only talks to MCP. If you find yourself adding DB env vars to the worker, stop — you're solving the wrong problem.
The full list and provisioning instructions live in the scrum4me-docker README. TODO: link to specific sections of that README once it's stable.
What the worker loop does, on a single iteration
sequenceDiagram
participant W as Worker
participant Q as worker-quota-probe.sh
participant M as MCP server
W->>Q: probe Anthropic quota
Q-->>W: { pct, reset_at_iso }
alt pct < MIN_QUOTA_PCT
W->>M: worker_heartbeat(pct, last_quota_check_at)
W->>W: sleep until reset_at_iso (cap 1h)
else quota ok
W->>M: worker_heartbeat(pct, last_quota_check_at)
W->>M: wait_for_job (block ≤600s, claim atomically)
alt queue empty
W->>W: continue (no work, loop again)
else got job
W->>W: execute by `kind`
W->>M: update_job_status(done|failed)
end
end
Note over W: continue forever
The loop is described authoritatively in docs/runbooks/mcp-integration.md and docs/runbooks/worker-idempotency.md.
Quota probe
bin/worker-quota-probe.sh (in scrum4me-docker) makes a tiny call to the Anthropic API to read the current quota percentage and reset time. Cost: ~1 output token per probe (~12 tokens/hour at 5-minute intervals). The default MIN_QUOTA_PCT is 20% — typically high enough on Pro/Max plans that the worker never pauses during normal day-job hours.
Heartbeat
Every iteration the worker calls worker_heartbeat({ last_quota_pct, last_quota_check_at }). The MCP server emits an SSE event so the NavBar in the Next.js app shows the worker as live. A heartbeat older than 15 seconds is rendered as "offline" / "stand-by" in the UI.
Stale-claim recovery
If a worker dies mid-job (process crash, container kill, network partition), its claimed job stays as CLAIMED in the database. After 30 minutes the next wait_for_job call automatically requeues it (CLAIMED → QUEUED) before claiming a fresh one. No manual intervention is required for clean recovery.
When you do need to manually requeue a job (e.g. you killed it intentionally and don't want to wait 30 min), the operator route is the admin board → "Requeue job" button. TODO: confirm the exact UI path; this is not yet documented in docs/runbooks/.
Running the worker locally
The intended local workflow per the project's standing memory is Mac-native Docker (the user's project_docker_default_target memory). High-level steps (verify against the scrum4me-docker README for exact commands):
- Clone
scrum4me-dockernext toScrum4Me/(so~/Development/Scrum4Me/scrum4me-docker/). - Provision the env vars above (typically a
.envfile in that repo, not committed). docker buildthe image anddocker runit with the env file mounted.- Watch container logs for the heartbeat/quota cycle.
- Trigger a job from the UI ("Voer alle uit" on the Solo Board) and verify the worker picks it up within ~5 seconds.
TODO: once the
scrum4me-dockerREADME has stabilised, replace the bullets above with copy-paste-ready commands. Until then, defer to that repo for canonical instructions.
Debugging a stuck worker
| Symptom | Likely cause | Fix |
|---|---|---|
| Worker shows offline in NavBar but container is running | worker_heartbeat not reaching MCP |
Check SCRUM4ME_BASE_URL and SCRUM4ME_BEARER_TOKEN; tail container logs for HTTP errors |
| Worker logs say "stand-by" indefinitely | pct < MIN_QUOTA_PCT and reset_at not reached |
Lower MIN_QUOTA_PCT for testing, or wait for the printed reset_at_iso |
Job stuck CLAIMED for >30 min |
Worker died mid-job | Wait — auto-requeue triggers on next wait_for_job |
| Worker claims job but never updates status | Crashed before update_job_status; container restarted in a loop |
Check docker logs; the next wait_for_job will requeue stale claims |
update_job_status returns 403 |
Bearer token doesn't match claimed_by_token_id |
The token was rotated mid-run; restart with fresh token |
For deeper troubleshooting see 06 — Troubleshooting.
Smoke-test references
Historical Docker smoke tests live in docs/docker-smoke/. They validated the worktree-isolation + branch-per-story flow when the Docker worker was first introduced. They are historical — don't expect them to be runnable as-is — but they're a useful reference when you want to verify the same flow on a new container image.
Deep links
| Topic | Source |
|---|---|
| Container image, Dockerfile, build | scrum4me-docker repo |
| Worker loop & quota check | docs/runbooks/mcp-integration.md |
| Worker idempotency / job-status protocol | docs/runbooks/worker-idempotency.md |
| Historical smoke tests | docs/docker-smoke/ |
| Sandbox / exploration repo | scrum4me-sbx repo |
What's next
→ 06 — Troubleshooting covers error codes and recovery procedures across the full stack.