# Technische specificatie — Ops Dashboard ## Architectuur in één plaatje ``` ┌────────────────┐ HTTPS ┌──────┐ HTTP ┌─────────────────┐ │ Browser (jou) ├─────────►│Caddy ├────────►│ ops-dashboard │ │ │ │ :443 │ │ Next.js 16 :3000│ └────────────────┘ └──────┘ └────┬────────┬───┘ │ │ HMAC HTTP │ │ TCP/SQL :3099 │ │ ┌───────────────▼┐ │ │ ops-agent │ │ │ Fastify on host│ │ │ spawn/exec │ │ └───┬────────────┘ │ │ │ ┌───────┴───────┐ ┌───────▼────────┐ │ Whitelisted │ │ Postgres 17 │ │ host commands │ │ db=ops_dashb.. │ │ docker/git/etc │ └────────────────┘ └───────────────┘ ``` Drie processen, één host: 1. **ops-dashboard** — Next.js app in Docker, op compose-bridge, exposed via Caddy 2. **ops-agent** — Node/Fastify service direct op host (geen container), heeft sudoers + docker.sock access 3. **postgres** — Docker container, dezelfde als die Scrum4Me al gebruikt; ops-dashboard heeft eigen DB `ops_dashboard` ## Stack | Laag | Technologie | Versie | Reden | |---|---|---|---| | App framework | Next.js | 16.2 (App Router) | RSC server-side fetching matched onze "render with agent data" patroon | | UI library | React | 19 | Bundled bij Next 16 | | Styling | Tailwind CSS | 4 | Utility-first; geen custom design system | | UI primitives | `@base-ui/react` | 1.4 | Headless components, geen Radix-lock-in | | Code highlighting | shiki | 1.29 | Server-side highlighting in Caddyfile view | | Database ORM | Prisma | 7.8 (via `@prisma/adapter-pg`) | Same as Scrum4Me; één skill om beide te onderhouden | | Auth (password) | bcryptjs | 3 | Geen native bindings nodig | | Session | Custom in `lib/session.ts` | — | Eenvoudig: token in DB, hash in cookie | | Agent | Fastify | 5 | Lichtgewicht, native SSE-streaming | | Agent whitelist | js-yaml | 4 | Read-only configfile | ## Deploy-topologie | Component | Locatie | Beheer | |---|---|---| | ops-dashboard | Docker container `scrum4me-ops-dashboard`, image `ops-dashboard:latest` | `docker compose` in `/srv/scrum4me/compose/docker-compose.yml` | | ops-agent | systemd unit `ops-agent.service`, host-binary `/opt/ops-agent/dist/index.js` | systemd, geïnstalleerd via `deploy/ops-agent/setup.sh` | | Caddyfile-route | Block in `/srv/scrum4me/caddy/Caddyfile` | Handmatig, na add restart Caddy-container | | Database | Postgres-container `scrum4me-postgres`, db `ops_dashboard` | Hergebruik bestaande container | | Backups | `/srv/scrum4me/backups/*.sql.gz` | Cron of handmatig via UI | Caddy routeert `ops.jp-visser.nl` → service-naam `ops-dashboard:3000` op compose-bridge. ## Data-model ``` User ├── id cuid (string PK) ├── email unique ├── pwd_hash bcrypt $2b$12$... └── created_at Session ├── id cuid (PK) ├── user_id → User ├── token_hash sha256 hex (cookie waarde wordt gehashed opgeslagen) └── expires_at 24h na create FlowRun ├── id cuid (PK) ├── user_id → User ├── flow_name string (bv. "update_scrum4me_web") ├── status enum: pending|running|success|failed|cancelled ├── started_at ├── finished_at nullable └── (1:N) FlowStep FlowStep ├── id cuid (PK) ├── flow_run_id → FlowRun (cascade delete) ├── step_index int ├── name string (zoals in YAML flow-definitie) ├── exit_code int nullable ├── stdout text (max 64KB, geknipt) ├── stderr text (max 64KB, geknipt) ├── started_at └── finished_at nullable ``` Migrations in `prisma/migrations/`. Seed in `prisma/seed.ts` (creëert eerste admin uit `SEED_USER_*`). ## Auth-flow ``` 1. Browser GET /login ← Set-Cookie: csrf_token=; SameSite=strict; httpOnly=false ← HTML form 2. Browser POST /api/auth/login Headers: Cookie: csrf_token=; ops_session=... x-csrf-token: ← double-submit CSRF check Body: { email, password } 3. Server: a. proxy.ts CSRF check (cookie==header) b. /api/auth/login route: - rate-limit per IP (5/min) - prisma.user.findUnique({ email }) - bcrypt.compare(password, user.pwd_hash) c. Bij succes: - generateSessionToken (32 bytes hex) - prisma.session.create({ token_hash: sha256(token), expires_at: now+24h }) - Set-Cookie ops_session=; HttpOnly; SameSite=strict; Secure (in prod) 4. Browser GET / Server: proxy.ts → als geen ops_session cookie → redirect /login Anders: getCurrentUser() leest cookie, hashed, prisma.session.findUnique({ token_hash }) ``` CSRF: double-submit cookie pattern. CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy via proxy.ts response-headers. ## Agent-protocol Dashboard → agent communicatie via `lib/agent-client.ts`: ``` POST http://172.18.0.1:3099/agent/v1/exec Headers: Authorization: Bearer Content-Type: application/json Body: { command_key: "docker_ps", args?: string[], stdin?: string } ``` Response: SSE stream ``` event: stdout data: {"data": ""} event: stderr data: {"data": ""} event: exit data: {"code": 0} ``` Agent server-side flow per call: 1. `req.body.command_key` → lookup in `/etc/ops-agent/commands.yml` 2. Bij hit: spawn `def.cmd[0]` met `def.cmd.slice(1) ++ args` (geen shell, geen interpolatie) 3. Stream stdout/stderr chunks naar SSE 4. Bij `child.close`: write `event: exit`, end response 5. Bij `child.error`: write `event: error`, end response 6. Bij `reply.raw.close` (client-disconnect): `child.kill()` 7. Audit-log naar journalctl: `{audit:true, command_key, args, exit_code, duration_ms}` `commands.yml` voorbeeld: ```yaml docker_ps: cmd: ["docker", "ps", "--format", "json"] description: "List running containers" git_status: cmd: ["git", "status", "--short", "--branch"] cwd_pattern: true # args[0] = cwd, rest = command args description: "Git status in a repo" systemctl_restart_caddy: cmd: ["sudo", "/usr/bin/systemctl", "restart", "caddy"] description: "Restart caddy service" ``` Geen `command_key` in whitelist → 403 Forbidden. ## Flows engine YAML-definitie in `ops-agent/flows.example/*.yml`: ```yaml name: update_scrum4me_web description: Pull main, build, restart container, verify steps: - name: Pull latest main command_key: git_pull args: ["/srv/scrum4me/repos/Scrum4Me", "main"] precondition: git_status_clean - name: Build container command_key: docker_compose_build args: ["scrum4me-web"] - name: Restart command_key: docker_compose_up args: ["-d", "scrum4me-web"] - name: Smoke test command_key: curl_status args: ["https://scrum4me.jp-visser.nl"] expect_exit_code: 0 ``` Runner (`ops-agent/src/lib/flow-runner.ts`): - Sequential, fail-fast - Per stap: check preconditions, spawn, capture stdout/stderr, store in FlowStep - Bij dry-run: vervang `spawn` door log van `def.cmd ++ args` - Bij echte run: stream via SSE naar dashboard `/api/flows/run` route ## Realtime in de UI Niet via WebSocket of Server-Sent Events op de dashboard-side. Auto-refresh wordt server-rendered (`export const dynamic = 'force-dynamic'`) met client-side `useEffect(setInterval, 5000)` om `router.refresh()` te triggeren. Flow-execution: client opent `EventSource` op `/api/flows/run/[id]` die de SSE van de agent doorstuurt. ## Configuratie Verplicht in `.env`: ```bash DATABASE_URL=postgresql://USER:PASS@postgres:5432/ops_dashboard OPS_AGENT_URL=http://172.18.0.1:3099 OPS_AGENT_SECRET= SEED_USER_EMAIL=admin@example.com SEED_USER_PASSWORD= ``` Optioneel: ```bash SYSTEMD_UNITS=scrum4me-web,ops-agent # comma-separated REPO_PATHS=/srv/scrum4me/repos/Scrum4Me,… # comma-separated absolute paths ``` Bij start: app valideert dat verplichte env vars gezet zijn; faalt fast met duidelijke error. ## Security-eigenschappen | Eigenschap | Implementatie | |---|---| | Wachtwoord-hashing | bcrypt 12 rounds | | Session-cookie | HttpOnly, SameSite=strict, Secure in prod, 24u TTL | | CSRF | Double-submit cookie pattern, validated in `proxy.ts` voor POSTs | | CSP | Strict in response headers — geen inline scripts behalve Next.js internals met nonce | | Agent-auth | HMAC via Bearer-token (`OPS_AGENT_SECRET`) — symmetrisch | | Command-injection | `spawn(bin, args, {shell: false})` — geen shell-interpolatie ooit | | Whitelist | `commands.yml` is single source of truth voor wat draaibaar is | | Sudo | `sudoers.d/ops-agent` met absolute paden + service-namen, geen wildcards | | Audit | Elke `/agent/v1/exec` call logt naar journalctl met `{audit:true, …}` markeer | | Rate-limit | Login 5/min/IP; agent per-secret zonder rate-limit (single-user trust) | | Bind | Agent bindt op `0.0.0.0:3099`; UFW staat alleen `172.18.0.0/16` toe | ## Niet-functionele eigenschappen | Eigenschap | Specificatie | |---|---| | Geen multi-tenancy | Eén user-row in DB, app verifieert alleen "is er een geldig session-record"; geen `WHERE user_id = ?` filter (single-tenant) | | Geen retry/queue | Failed flows blijven failed; user moet handmatig opnieuw klikken | | Geen migrations-automation | `prisma migrate deploy` is **niet** in de boot-flow; doe je expliciet bij elke deploy | | Geen graceful shutdown | Container SIGTERM → in-flight requests verloren; geen drain | | Logging | Stdout/stderr van containers via `docker logs`; agent via `journalctl -u ops-agent`; geen aggregator | ## Open punten - **Echte caddyfile-grammar** (IDEA-061) — nu nginx-fallback - **Multi-user / RBAC** — buiten scope, mogelijk later - **Rate-limit op agent** — voor multi-user toekomst nodig - **Real-time alerts** — momenteel pull-based, push naar Slack/Tailscale-only nog niet