Drie nieuwe markdown-bestanden onder /docs: - handleiding.md — voor de dagelijkse gebruiker: eerste login, modules, veelvoorkomende taken (Caddy editen, sprint mergen via flow), wat expliciet niet vanuit de UI kan, log-locaties bij incidenten, veiligheidsadvies. - specs/functional.md — wat de app doet: scope per module met acceptatiecriteria, flow state-machine (pending/running/success/ failed/cancelled/timeout), hard limits (1 actieve flow, 64KB log knippen, 24u session), expliciete buiten-scope-lijst. - specs/technical.md — hoe het werkt: 3-process architectuur (dashboard container + agent op host + Postgres), stack-tabel met versies en redenen, data-model (User/Session/FlowRun/FlowStep), auth-flow met CSRF, agent-protocol over SSE, security-eigenschappen per laag. Lengtes pragmatisch gekozen — geen completeness-fetisj, wel genoeg om iemand die nieuw is in de codebase binnen 30 min te oriënteren. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
270 lines
11 KiB
Markdown
270 lines
11 KiB
Markdown
# Technische specificatie — Ops Dashboard
|
|
|
|
## Architectuur in één plaatje
|
|
|
|
```
|
|
┌────────────────┐ HTTPS ┌──────┐ HTTP ┌─────────────────┐
|
|
│ Browser (jou) ├─────────►│Caddy ├────────►│ ops-dashboard │
|
|
│ │ │ :443 │ │ Next.js 16 :3000│
|
|
└────────────────┘ └──────┘ └────┬────────┬───┘
|
|
│ │
|
|
HMAC HTTP │ │ TCP/SQL
|
|
:3099 │ │
|
|
┌───────────────▼┐ │
|
|
│ ops-agent │ │
|
|
│ Fastify on host│ │
|
|
│ spawn/exec │ │
|
|
└───┬────────────┘ │
|
|
│ │
|
|
┌───────┴───────┐ ┌───────▼────────┐
|
|
│ Whitelisted │ │ Postgres 17 │
|
|
│ host commands │ │ db=ops_dashb.. │
|
|
│ docker/git/etc │ └────────────────┘
|
|
└───────────────┘
|
|
```
|
|
|
|
Drie processen, één host:
|
|
|
|
1. **ops-dashboard** — Next.js app in Docker, op compose-bridge, exposed via Caddy
|
|
2. **ops-agent** — Node/Fastify service direct op host (geen container), heeft sudoers + docker.sock access
|
|
3. **postgres** — Docker container, dezelfde als die Scrum4Me al gebruikt; ops-dashboard heeft eigen DB `ops_dashboard`
|
|
|
|
## Stack
|
|
|
|
| Laag | Technologie | Versie | Reden |
|
|
|---|---|---|---|
|
|
| App framework | Next.js | 16.2 (App Router) | RSC server-side fetching matched onze "render with agent data" patroon |
|
|
| UI library | React | 19 | Bundled bij Next 16 |
|
|
| Styling | Tailwind CSS | 4 | Utility-first; geen custom design system |
|
|
| UI primitives | `@base-ui/react` | 1.4 | Headless components, geen Radix-lock-in |
|
|
| Code highlighting | shiki | 1.29 | Server-side highlighting in Caddyfile view |
|
|
| Database ORM | Prisma | 7.8 (via `@prisma/adapter-pg`) | Same as Scrum4Me; één skill om beide te onderhouden |
|
|
| Auth (password) | bcryptjs | 3 | Geen native bindings nodig |
|
|
| Session | Custom in `lib/session.ts` | — | Eenvoudig: token in DB, hash in cookie |
|
|
| Agent | Fastify | 5 | Lichtgewicht, native SSE-streaming |
|
|
| Agent whitelist | js-yaml | 4 | Read-only configfile |
|
|
|
|
## Deploy-topologie
|
|
|
|
| Component | Locatie | Beheer |
|
|
|---|---|---|
|
|
| ops-dashboard | Docker container `scrum4me-ops-dashboard`, image `ops-dashboard:latest` | `docker compose` in `/srv/scrum4me/compose/docker-compose.yml` |
|
|
| ops-agent | systemd unit `ops-agent.service`, host-binary `/opt/ops-agent/dist/index.js` | systemd, geïnstalleerd via `deploy/ops-agent/setup.sh` |
|
|
| Caddyfile-route | Block in `/srv/scrum4me/caddy/Caddyfile` | Handmatig, na add restart Caddy-container |
|
|
| Database | Postgres-container `scrum4me-postgres`, db `ops_dashboard` | Hergebruik bestaande container |
|
|
| Backups | `/srv/scrum4me/backups/*.sql.gz` | Cron of handmatig via UI |
|
|
|
|
Caddy routeert `ops.jp-visser.nl` → service-naam `ops-dashboard:3000` op compose-bridge.
|
|
|
|
## Data-model
|
|
|
|
```
|
|
User
|
|
├── id cuid (string PK)
|
|
├── email unique
|
|
├── pwd_hash bcrypt $2b$12$...
|
|
└── created_at
|
|
|
|
Session
|
|
├── id cuid (PK)
|
|
├── user_id → User
|
|
├── token_hash sha256 hex (cookie waarde wordt gehashed opgeslagen)
|
|
└── expires_at 24h na create
|
|
|
|
FlowRun
|
|
├── id cuid (PK)
|
|
├── user_id → User
|
|
├── flow_name string (bv. "update_scrum4me_web")
|
|
├── status enum: pending|running|success|failed|cancelled
|
|
├── started_at
|
|
├── finished_at nullable
|
|
└── (1:N) FlowStep
|
|
|
|
FlowStep
|
|
├── id cuid (PK)
|
|
├── flow_run_id → FlowRun (cascade delete)
|
|
├── step_index int
|
|
├── name string (zoals in YAML flow-definitie)
|
|
├── exit_code int nullable
|
|
├── stdout text (max 64KB, geknipt)
|
|
├── stderr text (max 64KB, geknipt)
|
|
├── started_at
|
|
└── finished_at nullable
|
|
```
|
|
|
|
Migrations in `prisma/migrations/`. Seed in `prisma/seed.ts` (creëert eerste admin uit `SEED_USER_*`).
|
|
|
|
## Auth-flow
|
|
|
|
```
|
|
1. Browser GET /login
|
|
← Set-Cookie: csrf_token=<uuid>; SameSite=strict; httpOnly=false
|
|
← HTML form
|
|
|
|
2. Browser POST /api/auth/login
|
|
Headers:
|
|
Cookie: csrf_token=<uuid>; ops_session=...
|
|
x-csrf-token: <uuid> ← double-submit CSRF check
|
|
Body: { email, password }
|
|
|
|
3. Server:
|
|
a. proxy.ts CSRF check (cookie==header)
|
|
b. /api/auth/login route:
|
|
- rate-limit per IP (5/min)
|
|
- prisma.user.findUnique({ email })
|
|
- bcrypt.compare(password, user.pwd_hash)
|
|
c. Bij succes:
|
|
- generateSessionToken (32 bytes hex)
|
|
- prisma.session.create({ token_hash: sha256(token), expires_at: now+24h })
|
|
- Set-Cookie ops_session=<token>; HttpOnly; SameSite=strict; Secure (in prod)
|
|
|
|
4. Browser GET /<any-protected-path>
|
|
Server: proxy.ts → als geen ops_session cookie → redirect /login
|
|
Anders: getCurrentUser() leest cookie, hashed, prisma.session.findUnique({ token_hash })
|
|
```
|
|
|
|
CSRF: double-submit cookie pattern. CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy via proxy.ts response-headers.
|
|
|
|
## Agent-protocol
|
|
|
|
Dashboard → agent communicatie via `lib/agent-client.ts`:
|
|
|
|
```
|
|
POST http://172.18.0.1:3099/agent/v1/exec
|
|
Headers:
|
|
Authorization: Bearer <OPS_AGENT_SECRET>
|
|
Content-Type: application/json
|
|
Body:
|
|
{ command_key: "docker_ps", args?: string[], stdin?: string }
|
|
```
|
|
|
|
Response: SSE stream
|
|
|
|
```
|
|
event: stdout
|
|
data: {"data": "<chunk>"}
|
|
|
|
event: stderr
|
|
data: {"data": "<chunk>"}
|
|
|
|
event: exit
|
|
data: {"code": 0}
|
|
```
|
|
|
|
Agent server-side flow per call:
|
|
1. `req.body.command_key` → lookup in `/etc/ops-agent/commands.yml`
|
|
2. Bij hit: spawn `def.cmd[0]` met `def.cmd.slice(1) ++ args` (geen shell, geen interpolatie)
|
|
3. Stream stdout/stderr chunks naar SSE
|
|
4. Bij `child.close`: write `event: exit`, end response
|
|
5. Bij `child.error`: write `event: error`, end response
|
|
6. Bij `reply.raw.close` (client-disconnect): `child.kill()`
|
|
7. Audit-log naar journalctl: `{audit:true, command_key, args, exit_code, duration_ms}`
|
|
|
|
`commands.yml` voorbeeld:
|
|
|
|
```yaml
|
|
docker_ps:
|
|
cmd: ["docker", "ps", "--format", "json"]
|
|
description: "List running containers"
|
|
|
|
git_status:
|
|
cmd: ["git", "status", "--short", "--branch"]
|
|
cwd_pattern: true # args[0] = cwd, rest = command args
|
|
description: "Git status in a repo"
|
|
|
|
systemctl_restart_caddy:
|
|
cmd: ["sudo", "/usr/bin/systemctl", "restart", "caddy"]
|
|
description: "Restart caddy service"
|
|
```
|
|
|
|
Geen `command_key` in whitelist → 403 Forbidden.
|
|
|
|
## Flows engine
|
|
|
|
YAML-definitie in `ops-agent/flows.example/*.yml`:
|
|
|
|
```yaml
|
|
name: update_scrum4me_web
|
|
description: Pull main, build, restart container, verify
|
|
steps:
|
|
- name: Pull latest main
|
|
command_key: git_pull
|
|
args: ["/srv/scrum4me/repos/Scrum4Me", "main"]
|
|
precondition: git_status_clean
|
|
- name: Build container
|
|
command_key: docker_compose_build
|
|
args: ["scrum4me-web"]
|
|
- name: Restart
|
|
command_key: docker_compose_up
|
|
args: ["-d", "scrum4me-web"]
|
|
- name: Smoke test
|
|
command_key: curl_status
|
|
args: ["https://scrum4me.jp-visser.nl"]
|
|
expect_exit_code: 0
|
|
```
|
|
|
|
Runner (`ops-agent/src/lib/flow-runner.ts`):
|
|
- Sequential, fail-fast
|
|
- Per stap: check preconditions, spawn, capture stdout/stderr, store in FlowStep
|
|
- Bij dry-run: vervang `spawn` door log van `def.cmd ++ args`
|
|
- Bij echte run: stream via SSE naar dashboard `/api/flows/run` route
|
|
|
|
## Realtime in de UI
|
|
|
|
Niet via WebSocket of Server-Sent Events op de dashboard-side. Auto-refresh wordt server-rendered (`export const dynamic = 'force-dynamic'`) met client-side `useEffect(setInterval, 5000)` om `router.refresh()` te triggeren.
|
|
|
|
Flow-execution: client opent `EventSource` op `/api/flows/run/[id]` die de SSE van de agent doorstuurt.
|
|
|
|
## Configuratie
|
|
|
|
Verplicht in `.env`:
|
|
|
|
```bash
|
|
DATABASE_URL=postgresql://USER:PASS@postgres:5432/ops_dashboard
|
|
OPS_AGENT_URL=http://172.18.0.1:3099
|
|
OPS_AGENT_SECRET=<hex-32-bytes>
|
|
SEED_USER_EMAIL=admin@example.com
|
|
SEED_USER_PASSWORD=<sterk-wachtwoord>
|
|
```
|
|
|
|
Optioneel:
|
|
|
|
```bash
|
|
SYSTEMD_UNITS=scrum4me-web,ops-agent # comma-separated
|
|
REPO_PATHS=/srv/scrum4me/repos/Scrum4Me,… # comma-separated absolute paths
|
|
```
|
|
|
|
Bij start: app valideert dat verplichte env vars gezet zijn; faalt fast met duidelijke error.
|
|
|
|
## Security-eigenschappen
|
|
|
|
| Eigenschap | Implementatie |
|
|
|---|---|
|
|
| Wachtwoord-hashing | bcrypt 12 rounds |
|
|
| Session-cookie | HttpOnly, SameSite=strict, Secure in prod, 24u TTL |
|
|
| CSRF | Double-submit cookie pattern, validated in `proxy.ts` voor POSTs |
|
|
| CSP | Strict in response headers — geen inline scripts behalve Next.js internals met nonce |
|
|
| Agent-auth | HMAC via Bearer-token (`OPS_AGENT_SECRET`) — symmetrisch |
|
|
| Command-injection | `spawn(bin, args, {shell: false})` — geen shell-interpolatie ooit |
|
|
| Whitelist | `commands.yml` is single source of truth voor wat draaibaar is |
|
|
| Sudo | `sudoers.d/ops-agent` met absolute paden + service-namen, geen wildcards |
|
|
| Audit | Elke `/agent/v1/exec` call logt naar journalctl met `{audit:true, …}` markeer |
|
|
| Rate-limit | Login 5/min/IP; agent per-secret zonder rate-limit (single-user trust) |
|
|
| Bind | Agent bindt op `0.0.0.0:3099`; UFW staat alleen `172.18.0.0/16` toe |
|
|
|
|
## Niet-functionele eigenschappen
|
|
|
|
| Eigenschap | Specificatie |
|
|
|---|---|
|
|
| Geen multi-tenancy | Eén user-row in DB, app verifieert alleen "is er een geldig session-record"; geen `WHERE user_id = ?` filter (single-tenant) |
|
|
| Geen retry/queue | Failed flows blijven failed; user moet handmatig opnieuw klikken |
|
|
| Geen migrations-automation | `prisma migrate deploy` is **niet** in de boot-flow; doe je expliciet bij elke deploy |
|
|
| Geen graceful shutdown | Container SIGTERM → in-flight requests verloren; geen drain |
|
|
| Logging | Stdout/stderr van containers via `docker logs`; agent via `journalctl -u ops-agent`; geen aggregator |
|
|
|
|
## Open punten
|
|
|
|
- **Echte caddyfile-grammar** (IDEA-061) — nu nginx-fallback
|
|
- **Multi-user / RBAC** — buiten scope, mogelijk later
|
|
- **Rate-limit op agent** — voor multi-user toekomst nodig
|
|
- **Real-time alerts** — momenteel pull-based, push naar Slack/Tailscale-only nog niet
|