Phase 0 codex-runner-substrate — docker chunk (Tasks 3-6) #27

Merged
janpeter merged 9 commits from feat/codex-runner-substrate into master 2026-06-08 06:22:13 +02:00
Owner

Phase 0 — Codex-runner-substrate — docker chunk

Implementeert het docker-deel (Tasks 3-6) van het Phase 0 codex-runner-substrate plan. De MCP-slice (Tasks 1-2-7) is open in scrum4me-mcp #41. De host-canary (Task 8) en de ops-flow (Task 9) volgen na merge.

Plan + design (GO):

  • docs/superpowers/plans/2026-06-07-codex-runner-substrate-phase0-plan.md (codex r2 GO, 0 P1/P2, P3 over etc/codex pre-handled door repo-root codex/)
  • docs/superpowers/specs/2026-06-07-codex-runner-substrate-phase0-design.md (codex r2 GO + scrum4me-server operational GO)

Wijzigingen

Task 3 — bin/run-one-job.ts — threadt SCRUM4ME_WORKER_RUNTIME (via getWorkerRuntimeFromEnv) door registerWorker, startHeartbeat en beide tryClaimJob-call-sites (5e positional arg). Branched het binary (claude vs codex), de args (buildCodexArgs voor codex; Claude-flagset onveranderd) en de exit-classificatie (classifyCodexOutput voor codex; bestaande regex-scan voor claude). Skipt de Anthropic quota-probe voor CODEX (geen Anthropic-budget bij ChatGPT-plan auth).

Task 4 — bin/check-tokens.sh — runtime-aware credential-check. CLAUDE: bestaande ANTHROPIC_API_KEY/CLAUDE_CODE_OAUTH_TOKEN checks. CODEX: /home/agent/.codex/auth.json MOET aanwezig + readable + writable zijn (codex refresht z'n token zelf); codex login status is opportunistisch. De Scrum4Me-token + DATABASE_URL TCP-probe blijven runtime-agnostisch.

Task 5 — Dockerfile — multi-stage refactor in 3 stages:

  • base — system deps + node + gh + scrum4me-mcp clone + agent user + runner files + ENV + ENTRYPOINT (zonder agent-CLI).
  • codexnpm install -g @openai/codex@0.137.0-alpha.4 + COPY codex/ /opt/agent/etc/codex/.
  • claude — native Claude Code installer, LAST stage, zodat docker build . (no --target) byte-voor-byte hetzelfde Claude-image oplevert als vandaag.

Task 6 — codex config + entrypoint + compose:

  • codex/config.toml (repo-root, niet onder etc/): approval=never, sandbox=workspace-write (network on), MCP via npx tsx /opt/scrum4me-mcp/src/index.ts, required=true. env_vars forwardt token/DB/cache plus SCRUM4ME_WORKER_RUNTIME + SCRUM4ME_INSTANCE_ID + SCRUM4ME_WORKER_INSTANCE_ID + capability-vars zodat de MCP-subprocess als CODEX registreert. [mcp_servers.scrum4me.env] zet TSX_TSCONFIG_PATH expliciet.
  • bin/entrypoint.sh — runtime-gated install van /opt/agent/etc/codex/config.toml/home/agent/.codex/config.toml (via gosu agent, ná de settings.json-install, vóór de health-server).
  • docker-compose.ymlagent-codex service uit --target codex. Spiegelt agent-hardening (read_only / cap_drop ALL / no-new-privileges). Aparte ${NAS_BASE}/codex-home bind voor /home/agent/.codex (auth.json refresh overleeft --force-recreate — server-review P2). Aparte log/state-dirs en host-port 18081.

Verification

  • bash -n bin/check-tokens.sh — syntax OK
  • bash -n bin/entrypoint.sh — syntax OK
  • docker compose config -q — services: agent, agent-codex (clean parse)
  • Dockerfile stages parsen, claude is de laatste FROM (default target preserved)
  • mcp-symbols getWorkerRuntimeFromEnv / buildCodexArgs / classifyCodexOutput bestaan op feat/codex-runner-substrate van scrum4me-mcp
  • Docker daemon draait niet op de mac → image-builds en codex-smoke zijn deferred naar host 154 (Task 8 Step 3 + Step 4)

Volgende stappen (niet in deze PR)

  • Wachten op merge van scrum4me-mcp #41 (Tasks 1+2+7).
  • Task 8 op host 154: codex login${NAS_BASE}/codex-home/auth.json; deployed-compose sync; fleet-regression gate (no---target build moet de Claude worker-idea-fleet ongewijzigd laten); container-smoke (MCP+NODE_PATH+runtime-env-forwarding); canary (SYSTEM PLAN_CHAT CODEX-job DONE met 0 auth/MCP errors).
  • Task 9: update_codex_worker flow in Ops-dashboard (fast-follow zodat agent-codex mee-redeployt zonder de Claude-fleet te raken).

🤖 Generated with Claude Code

# Phase 0 — Codex-runner-substrate — docker chunk Implementeert het docker-deel (Tasks 3-6) van het Phase 0 codex-runner-substrate plan. De MCP-slice (Tasks 1-2-7) is open in scrum4me-mcp #41. De host-canary (Task 8) en de ops-flow (Task 9) volgen na merge. **Plan + design (GO):** - `docs/superpowers/plans/2026-06-07-codex-runner-substrate-phase0-plan.md` (codex r2 GO, 0 P1/P2, P3 over `etc/codex` pre-handled door repo-root `codex/`) - `docs/superpowers/specs/2026-06-07-codex-runner-substrate-phase0-design.md` (codex r2 GO + scrum4me-server operational GO) ## Wijzigingen **Task 3 — `bin/run-one-job.ts`** — threadt `SCRUM4ME_WORKER_RUNTIME` (via `getWorkerRuntimeFromEnv`) door `registerWorker`, `startHeartbeat` en beide `tryClaimJob`-call-sites (5e positional arg). Branched het binary (`claude` vs `codex`), de args (`buildCodexArgs` voor codex; Claude-flagset onveranderd) en de exit-classificatie (`classifyCodexOutput` voor codex; bestaande regex-scan voor claude). Skipt de Anthropic quota-probe voor CODEX (geen Anthropic-budget bij ChatGPT-plan auth). **Task 4 — `bin/check-tokens.sh`** — runtime-aware credential-check. CLAUDE: bestaande `ANTHROPIC_API_KEY`/`CLAUDE_CODE_OAUTH_TOKEN` checks. CODEX: `/home/agent/.codex/auth.json` MOET aanwezig + readable + writable zijn (codex refresht z'n token zelf); `codex login status` is opportunistisch. De Scrum4Me-token + DATABASE_URL TCP-probe blijven runtime-agnostisch. **Task 5 — `Dockerfile`** — multi-stage refactor in 3 stages: - `base` — system deps + node + gh + scrum4me-mcp clone + agent user + runner files + ENV + ENTRYPOINT (zonder agent-CLI). - `codex` — `npm install -g @openai/codex@0.137.0-alpha.4` + `COPY codex/ /opt/agent/etc/codex/`. - `claude` — native Claude Code installer, **LAST stage**, zodat `docker build .` (no `--target`) byte-voor-byte hetzelfde Claude-image oplevert als vandaag. **Task 6 — codex config + entrypoint + compose**: - `codex/config.toml` (repo-root, **niet** onder `etc/`): approval=`never`, sandbox=`workspace-write` (network on), MCP via `npx tsx /opt/scrum4me-mcp/src/index.ts`, `required=true`. `env_vars` forwardt token/DB/cache **plus** `SCRUM4ME_WORKER_RUNTIME` + `SCRUM4ME_INSTANCE_ID` + `SCRUM4ME_WORKER_INSTANCE_ID` + capability-vars zodat de MCP-subprocess als CODEX registreert. `[mcp_servers.scrum4me.env]` zet `TSX_TSCONFIG_PATH` expliciet. - `bin/entrypoint.sh` — runtime-gated install van `/opt/agent/etc/codex/config.toml` → `/home/agent/.codex/config.toml` (via `gosu agent`, ná de settings.json-install, vóór de health-server). - `docker-compose.yml` — `agent-codex` service uit `--target codex`. Spiegelt `agent`-hardening (read_only / cap_drop ALL / no-new-privileges). Aparte `${NAS_BASE}/codex-home` bind voor `/home/agent/.codex` (auth.json refresh overleeft `--force-recreate` — server-review P2). Aparte log/state-dirs en host-port 18081. ## Verification - `bash -n bin/check-tokens.sh` — syntax OK - `bash -n bin/entrypoint.sh` — syntax OK - `docker compose config -q` — services: `agent`, `agent-codex` (clean parse) - Dockerfile stages parsen, `claude` is de laatste FROM (default target preserved) - mcp-symbols `getWorkerRuntimeFromEnv` / `buildCodexArgs` / `classifyCodexOutput` bestaan op `feat/codex-runner-substrate` van scrum4me-mcp - Docker daemon draait niet op de mac → image-builds en codex-smoke zijn deferred naar host 154 (Task 8 Step 3 + Step 4) ## Volgende stappen (niet in deze PR) - Wachten op merge van scrum4me-mcp #41 (Tasks 1+2+7). - Task 8 op host 154: `codex login` → `${NAS_BASE}/codex-home/auth.json`; deployed-compose sync; **fleet-regression gate** (no-`--target` build moet de Claude `worker-idea`-fleet ongewijzigd laten); container-smoke (MCP+NODE_PATH+runtime-env-forwarding); canary (SYSTEM PLAN_CHAT CODEX-job DONE met 0 auth/MCP errors). - Task 9: `update_codex_worker` flow in Ops-dashboard (fast-follow zodat `agent-codex` mee-redeployt zonder de Claude-fleet te raken). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Design voor Phase 0 van codex-als-worker (Approach A): aparte agent-codex
fleet-service die een echte gequeue-de job end-to-end draait. Codex-reviewed
(NO-GO → GO). Gepusht voor een operationele review vanaf scrum4me-server.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dockerfile single->multi-stage refactor mét fleet-build-gate; /home/agent/.codex
dedicated dir-mount (auth.json persist/refresh); deployed-compose + update_codex_worker
deploy-flow; canary-host = 154. Beide reviews (codex bron + server operationeel) GO.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9 tasks, cross-repo (mcp pure-logic TDD + docker runner/image/compose + canary +
host-deploy). Grounded in verbatim run-one-job/Dockerfile/compose internals +
codex CLI 0.137-alpha flags. Design dubbel-GO (codex + scrum4me-server).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P2: pin CODEX_VERSION=0.137.0-alpha.4 (+ build-assert). P2: config.toml env_vars
forwards SCRUM4ME_WORKER_RUNTIME + instance/capability so the codex MCP subprocess
reports CODEX. P3: check-tokens local smoke -> bash -n + container runtime-validate.
P3: container-smoke spells out the config.toml install pre-step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex plan-review GO; non-blocking P3: keep etc/codex out of the base COPY so the
default Claude image is unchanged. Source moves to codex/; in-image path stays
/opt/agent/etc/codex/config.toml (copied only in the codex stage).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Threads SCRUM4ME_WORKER_RUNTIME (via getWorkerRuntimeFromEnv) into
registerWorker, startHeartbeat and both tryClaimJob call-sites (5th
positional arg). Branches the spawned binary (claude vs codex) and the
args array (buildCodexArgs for codex; existing Claude flag-set unchanged)
plus the exit-classification (classifyCodexOutput for codex; existing
TOKEN_EXPIRY_PATTERNS/API_OVERLOAD_PATTERNS for claude). Skips the
Anthropic quota probe for CODEX (no Anthropic-budget when codex auth is
ChatGPT-plan). The TSX_TSCONFIG_PATH strip on the child env stays for
both runtimes; codex MCP gets it via ~/.codex/config.toml.

Per docs/superpowers/plans/2026-06-07-codex-runner-substrate-phase0-plan.md
Task 3. Imports resolve at /opt/scrum4me-mcp inside the image; the
canary (Task 8) is the binding gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Branches the credential check on SCRUM4ME_WORKER_RUNTIME: CLAUDE keeps
the existing ANTHROPIC_API_KEY-absent + CLAUDE_CODE_OAUTH_TOKEN-present
checks unchanged; CODEX asserts /home/agent/.codex/auth.json exists, is
readable AND writable (codex must be able to refresh tokens), then
opportunistically calls `codex login status` for confirmation. The
Scrum4Me-token check + DATABASE_URL TCP-probe are runtime-agnostic and
stay unchanged.

Per docs/superpowers/plans/2026-06-07-codex-runner-substrate-phase0-plan.md
Task 4. Local: bash -n syntax-checks clean. Runtime validation moves
into the container smoke (Task 8 Step 4) because the script sources
/opt/agent/bin/_lib.sh which only exists inside the image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the previously-monolithic image into three stages: base (system
deps + node + gh + scrum4me-mcp clone + agent user + runner files + ENV
+ ENTRYPOINT), codex (base + @openai/codex@0.137.0-alpha.4 + COPY
codex/ template), and claude (base + native Claude Code installer). The
claude stage is LAST so a bare `docker build .` (no --target) produces
the exact image the worker-idea fleet runs — fleet build is byte-
unchanged. codex template lives at repo-root codex/ (not under etc/) so
the base layer never copies it into the claude image (P3 from codex
plan-review).

Per docs/superpowers/plans/2026-06-07-codex-runner-substrate-phase0-plan.md
Task 5. Local builds are skipped — Docker daemon is not running on the
mac and the codex build also requires codex/config.toml from Task 6.
The authoritative fleet-regression gate + codex build run on host 154
(Task 8 Step 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds codex/config.toml at repo-root: approval=never, sandbox=workspace-
write (network on), MCP via npx tsx /opt/scrum4me-mcp/src/index.ts,
required=true, with env_vars forwarding SCRUM4ME_TOKEN/DATABASE_URL/
DIRECT_URL/NODE_PATH/NPM_CONFIG_CACHE plus SCRUM4ME_WORKER_RUNTIME +
SCRUM4ME_INSTANCE_ID + SCRUM4ME_WORKER_INSTANCE_ID + capability vars so
the MCP subprocess registers as CODEX (codex plan-review P2). The
[mcp_servers.scrum4me.env] block sets TSX_TSCONFIG_PATH explicitly
(env_vars does not expand placeholders).

entrypoint.sh installs /opt/agent/etc/codex/config.toml to
/home/agent/.codex/config.toml after the existing settings.json install
block and before the health-server start. Runtime-gated on
SCRUM4ME_WORKER_RUNTIME=CODEX so the Claude image runs unchanged.
install via gosu agent (no CAP_DAC_OVERRIDE under cap_drop:ALL).

docker-compose.yml adds an agent-codex service from the same Dockerfile
with target=codex. Mirrors agent hardening (read_only, cap_drop ALL,
no-new-privileges) and adds a dedicated ${NAS_BASE}/codex-home bind into
/home/agent/.codex so the codex auth.json refresh survives
--force-recreate (server-review P2). Separate logs-codex/state-codex
dirs + a non-conflicting host port (18081).

Per docs/superpowers/plans/2026-06-07-codex-runner-substrate-phase0-plan.md
Task 6. `docker compose config -q` validates clean (services: agent,
agent-codex). bash -n on entrypoint.sh passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
janpeter merged commit 5a60c4b6bc into master 2026-06-08 06:22:13 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
janpeter/scrum4me-docker!27
No description provided.