---
title: "Docker"
status: active
audience: [contributor]
language: en
last_updated: 2026-05-07
when_to_read: "Before running the worker locally, debugging a stuck job, or operating the Mac/NAS deployment."
---

# 05 — Docker

This chapter is the contributor's tour of the Docker side of Scrum4Me. Two important up-front facts:

1. **The Next.js app is not containerised.** The web UI, API routes, server actions, and database connection all run on **Vercel** (serverless functions + Edge runtime). There is no `Dockerfile` in this repo and no `docker-compose.yml`.
2. **Only the worker is containerised.** The "worker" is a Claude Code agent in a long-running container that polls the Scrum4Me job queue via MCP and executes `TASK_IMPLEMENTATION` / `IDEA_GRILL` / `IDEA_MAKE_PLAN` / `SPRINT_IMPLEMENTATION` jobs.

The container image and its supporting scripts live in a **separate repo**: [`madhura68/scrum4me-docker`](https://github.com/madhura68/scrum4me-docker). This manual documents the consumer side — what the worker is, how it relates to Scrum4Me, and how to diagnose issues. The container internals (Dockerfile, entrypoint, agent provisioning) are out of scope for this manual; see that repo's README.

> **Note:** A separate sandbox repo `scrum4me-sbx` ([`SC-4`](https://github.com/madhura68/scrum4me-sbx)) exists for Docker exploration. Treat it as a scratchpad, not as the production worker.

## Topology

```mermaid
flowchart LR
    subgraph Vercel
        App[Next.js app<br/>+ API routes]
    end
    subgraph Neon
        DB[(Postgres)]
    end
    subgraph Mac["Mac (default) / NAS (opt-in)"]
        Worker[Worker container<br/>Claude Code + MCP]
    end
    Worker -- MCP over HTTPS --> App
    App -- Prisma --> DB
    Worker -- git push --> GH[GitHub]
    GH -- webhooks --> App
```

- The worker **never connects to the database directly**. All state changes go through MCP tools, which call the Vercel-hosted REST API, which writes to Neon via Prisma.
- The worker **does** push commits directly to GitHub. GitHub then notifies Vercel and the auto-PR flow ([03 — Git Workflow](./03-git-workflow.md)) takes over.

## Mac vs NAS

| Flow | When to use | Status |
|---|---|---|
| **Mac-native (arm64)** | Default for development and small teams | Active |
| **NAS** | Self-hosted always-on worker on a Synology / Asustor / similar | Opt-in, validated by historical smoke tests in [`docs/docker-smoke/`](../docker-smoke/) |

The Mac flow is the default because it doesn't require dedicated hardware. The container runs natively on Apple Silicon (arm64) — no x86 emulation overhead.

## Environment variables the worker needs

The worker container needs **only** what's required to authenticate to MCP and push to GitHub:

| Var | Purpose |
|---|---|
| `SCRUM4ME_BEARER_TOKEN` | Bearer token bound to a product. Returned by the user's API-token settings page. |
| `SCRUM4ME_BASE_URL` | Usually `https://scrum4me.vercel.app` (or the user's domain). |
| `GITHUB_TOKEN` | Personal access token with `repo` scope, used by `git push` and `gh pr create`. |
| `ANTHROPIC_API_KEY` | The Claude API key used by the worker process. |
| `MIN_QUOTA_PCT` | Optional. Worker pauses if Anthropic quota drops below this percentage. |

> **Hardstop:** the worker does **not** need `DATABASE_URL`, `SESSION_SECRET`, or `CRON_SECRET`. Those belong to the Next.js app; the worker only talks to MCP. If you find yourself adding DB env vars to the worker, stop — you're solving the wrong problem.

The full list and provisioning instructions live in the [`scrum4me-docker` README](https://github.com/madhura68/scrum4me-docker). **TODO:** link to specific sections of that README once it's stable.

## What the worker loop does, on a single iteration

```mermaid
sequenceDiagram
    participant W as Worker
    participant Q as worker-quota-probe.sh
    participant M as MCP server
    W->>Q: probe Anthropic quota
    Q-->>W: { pct, reset_at_iso }
    alt pct < MIN_QUOTA_PCT
        W->>M: worker_heartbeat(pct, last_quota_check_at)
        W->>W: sleep until reset_at_iso (cap 1h)
    else quota ok
        W->>M: worker_heartbeat(pct, last_quota_check_at)
        W->>M: wait_for_job (block ≤600s, claim atomically)
        alt queue empty
            W->>W: continue (no work, loop again)
        else got job
            W->>W: execute by `kind`
            W->>M: update_job_status(done|failed)
        end
    end
    Note over W: continue forever
```

The loop is described authoritatively in [`docs/runbooks/mcp-integration.md`](../runbooks/mcp-integration.md#batch-loop-verplichte-agent-flow) and [`docs/runbooks/worker-idempotency.md`](../runbooks/worker-idempotency.md).

### Quota probe

`bin/worker-quota-probe.sh` (in `scrum4me-docker`) makes a tiny call to the Anthropic API to read the current quota percentage and reset time. Cost: ~1 output token per probe (~12 tokens/hour at 5-minute intervals). The default `MIN_QUOTA_PCT` is **20%** — typically high enough on Pro/Max plans that the worker never pauses during normal day-job hours.

### Heartbeat

Every iteration the worker calls `worker_heartbeat({ last_quota_pct, last_quota_check_at })`. The MCP server emits an SSE event so the NavBar in the Next.js app shows the worker as live. A heartbeat older than 15 seconds is rendered as "offline" / "stand-by" in the UI.

### Stale-claim recovery

If a worker dies mid-job (process crash, container kill, network partition), its claimed job stays as `CLAIMED` in the database. After **30 minutes** the next `wait_for_job` call automatically requeues it (`CLAIMED → QUEUED`) before claiming a fresh one. No manual intervention is required for clean recovery.

When you **do** need to manually requeue a job (e.g. you killed it intentionally and don't want to wait 30 min), the operator route is the admin board → "Requeue job" button. **TODO:** confirm the exact UI path; this is not yet documented in `docs/runbooks/`.

## Running the worker locally

The intended local workflow per the project's standing memory is **Mac-native Docker** (the user's `project_docker_default_target` memory). High-level steps (verify against the [scrum4me-docker README](https://github.com/madhura68/scrum4me-docker) for exact commands):

1. Clone `scrum4me-docker` next to `Scrum4Me/` (so `~/Development/Scrum4Me/scrum4me-docker/`).
2. Provision the env vars above (typically a `.env` file in that repo, **not committed**).
3. `docker build` the image and `docker run` it with the env file mounted.
4. Watch container logs for the heartbeat/quota cycle.
5. Trigger a job from the UI ("Voer alle uit" on the Solo Board) and verify the worker picks it up within ~5 seconds.

> **TODO:** once the `scrum4me-docker` README has stabilised, replace the bullets above with copy-paste-ready commands. Until then, defer to that repo for canonical instructions.

## Debugging a stuck worker

| Symptom | Likely cause | Fix |
|---|---|---|
| Worker shows offline in NavBar but container is running | `worker_heartbeat` not reaching MCP | Check `SCRUM4ME_BASE_URL` and `SCRUM4ME_BEARER_TOKEN`; tail container logs for HTTP errors |
| Worker logs say "stand-by" indefinitely | `pct < MIN_QUOTA_PCT` and reset_at not reached | Lower `MIN_QUOTA_PCT` for testing, or wait for the printed `reset_at_iso` |
| Job stuck `CLAIMED` for >30 min | Worker died mid-job | Wait — auto-requeue triggers on next `wait_for_job` |
| Worker claims job but never updates status | Crashed before `update_job_status`; container restarted in a loop | Check `docker logs`; the next `wait_for_job` will requeue stale claims |
| `update_job_status` returns `403` | Bearer token doesn't match `claimed_by_token_id` | The token was rotated mid-run; restart with fresh token |

For deeper troubleshooting see [06 — Troubleshooting](./06-troubleshooting.md).

## Smoke-test references

Historical Docker smoke tests live in [`docs/docker-smoke/`](../docker-smoke/). They validated the worktree-isolation + branch-per-story flow when the Docker worker was first introduced. They are **historical** — don't expect them to be runnable as-is — but they're a useful reference when you want to verify the same flow on a new container image.

## Deep links

| Topic | Source |
|---|---|
| Container image, Dockerfile, build | [`scrum4me-docker` repo](https://github.com/madhura68/scrum4me-docker) |
| Worker loop & quota check | [`docs/runbooks/mcp-integration.md`](../runbooks/mcp-integration.md#pre-flight-quota-check-m13) |
| Worker idempotency / job-status protocol | [`docs/runbooks/worker-idempotency.md`](../runbooks/worker-idempotency.md) |
| Historical smoke tests | [`docs/docker-smoke/`](../docker-smoke/) |
| Sandbox / exploration repo | [`scrum4me-sbx` repo](https://github.com/madhura68/scrum4me-sbx) |

## What's next

→ [06 — Troubleshooting](./06-troubleshooting.md) covers error codes and recovery procedures across the full stack.