Drie nieuwe markdown-bestanden onder /docs: - handleiding.md — voor de dagelijkse gebruiker: eerste login, modules, veelvoorkomende taken (Caddy editen, sprint mergen via flow), wat expliciet niet vanuit de UI kan, log-locaties bij incidenten, veiligheidsadvies. - specs/functional.md — wat de app doet: scope per module met acceptatiecriteria, flow state-machine (pending/running/success/ failed/cancelled/timeout), hard limits (1 actieve flow, 64KB log knippen, 24u session), expliciete buiten-scope-lijst. - specs/technical.md — hoe het werkt: 3-process architectuur (dashboard container + agent op host + Postgres), stack-tabel met versies en redenen, data-model (User/Session/FlowRun/FlowStep), auth-flow met CSRF, agent-protocol over SSE, security-eigenschappen per laag. Lengtes pragmatisch gekozen — geen completeness-fetisj, wel genoeg om iemand die nieuw is in de codebase binnen 30 min te oriënteren. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
Technische specificatie — Ops Dashboard
Architectuur in één plaatje
┌────────────────┐ HTTPS ┌──────┐ HTTP ┌─────────────────┐
│ Browser (jou) ├─────────►│Caddy ├────────►│ ops-dashboard │
│ │ │ :443 │ │ Next.js 16 :3000│
└────────────────┘ └──────┘ └────┬────────┬───┘
│ │
HMAC HTTP │ │ TCP/SQL
:3099 │ │
┌───────────────▼┐ │
│ ops-agent │ │
│ Fastify on host│ │
│ spawn/exec │ │
└───┬────────────┘ │
│ │
┌───────┴───────┐ ┌───────▼────────┐
│ Whitelisted │ │ Postgres 17 │
│ host commands │ │ db=ops_dashb.. │
│ docker/git/etc │ └────────────────┘
└───────────────┘
Drie processen, één host:
- ops-dashboard — Next.js app in Docker, op compose-bridge, exposed via Caddy
- ops-agent — Node/Fastify service direct op host (geen container), heeft sudoers + docker.sock access
- postgres — Docker container, dezelfde als die Scrum4Me al gebruikt; ops-dashboard heeft eigen DB
ops_dashboard
Stack
| Laag | Technologie | Versie | Reden |
|---|---|---|---|
| App framework | Next.js | 16.2 (App Router) | RSC server-side fetching matched onze "render with agent data" patroon |
| UI library | React | 19 | Bundled bij Next 16 |
| Styling | Tailwind CSS | 4 | Utility-first; geen custom design system |
| UI primitives | @base-ui/react |
1.4 | Headless components, geen Radix-lock-in |
| Code highlighting | shiki | 1.29 | Server-side highlighting in Caddyfile view |
| Database ORM | Prisma | 7.8 (via @prisma/adapter-pg) |
Same as Scrum4Me; één skill om beide te onderhouden |
| Auth (password) | bcryptjs | 3 | Geen native bindings nodig |
| Session | Custom in lib/session.ts |
— | Eenvoudig: token in DB, hash in cookie |
| Agent | Fastify | 5 | Lichtgewicht, native SSE-streaming |
| Agent whitelist | js-yaml | 4 | Read-only configfile |
Deploy-topologie
| Component | Locatie | Beheer |
|---|---|---|
| ops-dashboard | Docker container scrum4me-ops-dashboard, image ops-dashboard:latest |
docker compose in /srv/scrum4me/compose/docker-compose.yml |
| ops-agent | systemd unit ops-agent.service, host-binary /opt/ops-agent/dist/index.js |
systemd, geïnstalleerd via deploy/ops-agent/setup.sh |
| Caddyfile-route | Block in /srv/scrum4me/caddy/Caddyfile |
Handmatig, na add restart Caddy-container |
| Database | Postgres-container scrum4me-postgres, db ops_dashboard |
Hergebruik bestaande container |
| Backups | /srv/scrum4me/backups/*.sql.gz |
Cron of handmatig via UI |
Caddy routeert ops.jp-visser.nl → service-naam ops-dashboard:3000 op compose-bridge.
Data-model
User
├── id cuid (string PK)
├── email unique
├── pwd_hash bcrypt $2b$12$...
└── created_at
Session
├── id cuid (PK)
├── user_id → User
├── token_hash sha256 hex (cookie waarde wordt gehashed opgeslagen)
└── expires_at 24h na create
FlowRun
├── id cuid (PK)
├── user_id → User
├── flow_name string (bv. "update_scrum4me_web")
├── status enum: pending|running|success|failed|cancelled
├── started_at
├── finished_at nullable
└── (1:N) FlowStep
FlowStep
├── id cuid (PK)
├── flow_run_id → FlowRun (cascade delete)
├── step_index int
├── name string (zoals in YAML flow-definitie)
├── exit_code int nullable
├── stdout text (max 64KB, geknipt)
├── stderr text (max 64KB, geknipt)
├── started_at
└── finished_at nullable
Migrations in prisma/migrations/. Seed in prisma/seed.ts (creëert eerste admin uit SEED_USER_*).
Auth-flow
1. Browser GET /login
← Set-Cookie: csrf_token=<uuid>; SameSite=strict; httpOnly=false
← HTML form
2. Browser POST /api/auth/login
Headers:
Cookie: csrf_token=<uuid>; ops_session=...
x-csrf-token: <uuid> ← double-submit CSRF check
Body: { email, password }
3. Server:
a. proxy.ts CSRF check (cookie==header)
b. /api/auth/login route:
- rate-limit per IP (5/min)
- prisma.user.findUnique({ email })
- bcrypt.compare(password, user.pwd_hash)
c. Bij succes:
- generateSessionToken (32 bytes hex)
- prisma.session.create({ token_hash: sha256(token), expires_at: now+24h })
- Set-Cookie ops_session=<token>; HttpOnly; SameSite=strict; Secure (in prod)
4. Browser GET /<any-protected-path>
Server: proxy.ts → als geen ops_session cookie → redirect /login
Anders: getCurrentUser() leest cookie, hashed, prisma.session.findUnique({ token_hash })
CSRF: double-submit cookie pattern. CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy via proxy.ts response-headers.
Agent-protocol
Dashboard → agent communicatie via lib/agent-client.ts:
POST http://172.18.0.1:3099/agent/v1/exec
Headers:
Authorization: Bearer <OPS_AGENT_SECRET>
Content-Type: application/json
Body:
{ command_key: "docker_ps", args?: string[], stdin?: string }
Response: SSE stream
event: stdout
data: {"data": "<chunk>"}
event: stderr
data: {"data": "<chunk>"}
event: exit
data: {"code": 0}
Agent server-side flow per call:
req.body.command_key→ lookup in/etc/ops-agent/commands.yml- Bij hit: spawn
def.cmd[0]metdef.cmd.slice(1) ++ args(geen shell, geen interpolatie) - Stream stdout/stderr chunks naar SSE
- Bij
child.close: writeevent: exit, end response - Bij
child.error: writeevent: error, end response - Bij
reply.raw.close(client-disconnect):child.kill() - Audit-log naar journalctl:
{audit:true, command_key, args, exit_code, duration_ms}
commands.yml voorbeeld:
docker_ps:
cmd: ["docker", "ps", "--format", "json"]
description: "List running containers"
git_status:
cmd: ["git", "status", "--short", "--branch"]
cwd_pattern: true # args[0] = cwd, rest = command args
description: "Git status in a repo"
systemctl_restart_caddy:
cmd: ["sudo", "/usr/bin/systemctl", "restart", "caddy"]
description: "Restart caddy service"
Geen command_key in whitelist → 403 Forbidden.
Flows engine
YAML-definitie in ops-agent/flows.example/*.yml:
name: update_scrum4me_web
description: Pull main, build, restart container, verify
steps:
- name: Pull latest main
command_key: git_pull
args: ["/srv/scrum4me/repos/Scrum4Me", "main"]
precondition: git_status_clean
- name: Build container
command_key: docker_compose_build
args: ["scrum4me-web"]
- name: Restart
command_key: docker_compose_up
args: ["-d", "scrum4me-web"]
- name: Smoke test
command_key: curl_status
args: ["https://scrum4me.jp-visser.nl"]
expect_exit_code: 0
Runner (ops-agent/src/lib/flow-runner.ts):
- Sequential, fail-fast
- Per stap: check preconditions, spawn, capture stdout/stderr, store in FlowStep
- Bij dry-run: vervang
spawndoor log vandef.cmd ++ args - Bij echte run: stream via SSE naar dashboard
/api/flows/runroute
Realtime in de UI
Niet via WebSocket of Server-Sent Events op de dashboard-side. Auto-refresh wordt server-rendered (export const dynamic = 'force-dynamic') met client-side useEffect(setInterval, 5000) om router.refresh() te triggeren.
Flow-execution: client opent EventSource op /api/flows/run/[id] die de SSE van de agent doorstuurt.
Configuratie
Verplicht in .env:
DATABASE_URL=postgresql://USER:PASS@postgres:5432/ops_dashboard
OPS_AGENT_URL=http://172.18.0.1:3099
OPS_AGENT_SECRET=<hex-32-bytes>
SEED_USER_EMAIL=admin@example.com
SEED_USER_PASSWORD=<sterk-wachtwoord>
Optioneel:
SYSTEMD_UNITS=scrum4me-web,ops-agent # comma-separated
REPO_PATHS=/srv/scrum4me/repos/Scrum4Me,… # comma-separated absolute paths
Bij start: app valideert dat verplichte env vars gezet zijn; faalt fast met duidelijke error.
Security-eigenschappen
| Eigenschap | Implementatie |
|---|---|
| Wachtwoord-hashing | bcrypt 12 rounds |
| Session-cookie | HttpOnly, SameSite=strict, Secure in prod, 24u TTL |
| CSRF | Double-submit cookie pattern, validated in proxy.ts voor POSTs |
| CSP | Strict in response headers — geen inline scripts behalve Next.js internals met nonce |
| Agent-auth | HMAC via Bearer-token (OPS_AGENT_SECRET) — symmetrisch |
| Command-injection | spawn(bin, args, {shell: false}) — geen shell-interpolatie ooit |
| Whitelist | commands.yml is single source of truth voor wat draaibaar is |
| Sudo | sudoers.d/ops-agent met absolute paden + service-namen, geen wildcards |
| Audit | Elke /agent/v1/exec call logt naar journalctl met {audit:true, …} markeer |
| Rate-limit | Login 5/min/IP; agent per-secret zonder rate-limit (single-user trust) |
| Bind | Agent bindt op 0.0.0.0:3099; UFW staat alleen 172.18.0.0/16 toe |
Niet-functionele eigenschappen
| Eigenschap | Specificatie |
|---|---|
| Geen multi-tenancy | Eén user-row in DB, app verifieert alleen "is er een geldig session-record"; geen WHERE user_id = ? filter (single-tenant) |
| Geen retry/queue | Failed flows blijven failed; user moet handmatig opnieuw klikken |
| Geen migrations-automation | prisma migrate deploy is niet in de boot-flow; doe je expliciet bij elke deploy |
| Geen graceful shutdown | Container SIGTERM → in-flight requests verloren; geen drain |
| Logging | Stdout/stderr van containers via docker logs; agent via journalctl -u ops-agent; geen aggregator |
Open punten
- Echte caddyfile-grammar (IDEA-061) — nu nginx-fallback
- Multi-user / RBAC — buiten scope, mogelijk later
- Rate-limit op agent — voor multi-user toekomst nodig
- Real-time alerts — momenteel pull-based, push naar Slack/Tailscale-only nog niet