Commit graph

12 commits

Author SHA1 Message Date
Janpeter Visser
2b03ee02e0 feat(server-backup): one-shot install script for ops-agent wiring
Adds deploy/server-backup/install-flows.sh — een idempotent installer die
de ops-agent-zijde van de server-backup feature aan elkaar plakt:

  1. wrappers/*.sh                 → /srv/backups/scripts/wrappers/
  2. flows.example/server_backup_* → /etc/ops-agent/flows/
  3. commands.yml.example commands → /etc/ops-agent/commands.yml (append, met backup)
  4. NOPASSWD-regels voor wrappers → /etc/sudoers.d/ops-agent (visudo-validated)
  5. systemctl restart ops-agent
  6. systemctl enable --now server-backup.timer

Wat het bewust *niet* doet (staat in scriptheader): restic env/password
aanmaken, repos initialiseren, base-scripts of systemd-units plaatsen —
die secrets-stappen blijven handwerk per README "Snelle installatie".

Re-run safe:
- cmp-check per file in stappen 1-2 (skip als identiek)
- grep-check op command-name in stap 3 (skip als al aanwezig)
- visudo-validatie in stap 4 voorkomt lockout bij syntax-fout
- backups van mutaties: commands.yml.bak.<ts> en sudoers.d/ops-agent.bak.<ts>

Regex-fix t.o.v. eerste handmatige run vandaag: command-block-extractie
gebruikt nu [a-z0-9_]+ ipv [a-z_]+, zodat namen met digits (restic_*_b2)
als losse blocks gezien worden. Het oude pattern miste ze maar sleepte
ze toevallig mee in het vorige block — eindresultaat correct, output
misleidend. Nieuwe versie faalt expliciet als een command echt ontbreekt.

README aangevuld met sectie "Ops-agent wiring (na stap 1-7)" die naar
het script verwijst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 21:04:43 +02:00
Janpeter Visser
20de584759 fix(server-backup): host-paths + script bugs uit eerste install
Kleine correcties bovenop ab87c0f, gevonden tijdens de eerste install
op scrum4me-srv (zie docs/runbooks/server-backup.md addendum):

- restic-backup.env.example: NAS-pad → /mnt/nas/backups/restic/scrum4me-srv,
  Forgejo-container → scrum4me-forgejo (waren placeholders die niet matchten
  met de actuele server-state).
- server-backup.service: ReadWritePaths uitgebreid met /mnt/nas/backups —
  ProtectSystem=strict blokkeerde anders schrijven naar de NAS-repo.
  RequiresMountsFor=/mnt/nas/backups toegevoegd om cifs-automount te triggeren
  bij timer-fire. Documentation=-URL gecorrigeerd naar /srv/scrum4me/.
- server-backup.sh: --skip-db verwijderd uit forgejo dump (Forgejo 11.x heeft
  die flag niet meer; DB komt nu mee in de zip, redundant met de aparte
  forgejo_db_dump-fase maar onschuldig).
- server-backup.sh: subshell-bug in determine_exit_code gefixt — werd
  aangeroepen via $(...), dus OVERALL_STATUS lekte niet naar de parent
  en write_status_json schreef altijd "unknown".
- restore-test.sh: --include filter toegevoegd op de assertion-paden — een
  full restore (~476 GiB logical) liep direct vol op /tmp (7.6 GB tmpfs)
  met 3.3M ENOSPC-errors. Nu 59 MiB in 10s.
- runbook: paden /srv/ops/repos/... → /srv/scrum4me/ops-dashboard/...,
  <forgejo>-placeholders → scrum4me-forgejo, concrete cifs-prefixpath
  fstab-regel in Deel A3, en een gevuld addendum met alle bevindingen
  van de eerste install (B2-bucket-naam ScrumForMeSrvBackup, sudo -E quirk,
  storage-cap incident, dedup-cijfers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:34:21 +02:00
Madhura68
ab87c0fada feat(server-backup): restic dual-repo backup (NAS + B2) with dashboard UI
Adds a server-wide backup capability beyond the existing ops_dashboard
pg_dump flow:

- Daily systemd timer (03:30) runs pg_dumpall + Forgejo dump, then restic
  to a local NAS repo and an offsite Backblaze B2 repo with Object Lock.
  Phase-based script with single-instance flock, structured statusfile,
  systemd hardening, and live-datadir excludes (Postgres / Forgejo) so
  the dumps stay authoritative.
- Ops-agent gets nine new read-only/trigger commands (snapshots, stats,
  status, logs, plus two triggers) backed by sudoers-whitelisted wrapper
  scripts that source /etc/restic-backup.env so the agent never sees the
  restic password or B2 keys.
- Two new flows (server_backup_full, server_backup_restore_test) drive
  the dashboard's "Backup now" and "Restore test" buttons.
- /settings/backups gains a Server backup section with overall + per-phase
  status, NAS / B2 snapshot tables, restore-size / raw-data / dedup-ratio
  stats, and the last restore-test result. The existing pg_dump section
  is preserved unchanged.
- Runbook docs/runbooks/server-backup.md follows the tailscale-setup
  pattern (plan + addendum) and covers B2 Object Lock + scoped keys,
  Forgejo subplan with isolated restore-test stack, the off-server
  maintenance flow for B2 prune, and the integrity-check schedule.

Code-only change — installation on scrum4me-srv follows the runbook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 13:03:00 +02:00
Scrum4Me Agent
2b11b999c0 fix(caddy): gebruik Docker servicenaam ipv host-IP in Caddyfile
Vervang `reverse_proxy 172.18.0.1:3001` door `reverse_proxy ops-dashboard:3000`
zodat de reverse-proxy stabiel werkt via Docker service-name resolution.
Voeg comments toe als pre-conditie: Caddy moet op hetzelfde Docker-netwerk zitten.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 22:16:04 +02:00
Scrum4Me Agent
e5423de319 fix(deploy): update build context naar /srv/ops/repos/ops-dashboard
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 22:15:09 +02:00
Janpeter Visser
252e535f23 fix(deploy): install dev deps voor TypeScript-build, prune erna
`npm ci --omit=dev` voor `npx tsc` faalde omdat TypeScript in
devDependencies zit. npx probeerde de typo-squatter `tsc@2.0.4` te
installeren. Nu: volledige install → tsc → prune --omit=dev voor
slanke runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 21:42:24 +02:00
Scrum4Me Agent
caeb5f3306 feat(ops): self-update script, systemd units, README install guide, recovery runbook
- deploy/ops-dashboard-updater/update.sh: git pull → docker build → force-recreate → smoke-test
- deploy/ops-dashboard-updater/install.sh: installs script + systemd units to host
- ops-dashboard-updater.service / .timer: oneshot + daily 03:00 scheduled trigger
- README.md: Installation and Configuration sections (env files, ops-agent, updater)
- docs/runbooks/recovery.md: agent-crash, DB corruption/restore, container failure, cert expiry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 20:10:21 +02:00
Scrum4Me Agent
4dd0490afc feat(backup): add ops-db backup commands, flow, and systemd timer
Adds pg_dump_ops_db, list_ops_backups, and cleanup_ops_backups to the
agent command whitelist. Includes a backup_ops_db flow YAML (dump +
30-day retention), and a systemd service/timer for daily automated
backups at 02:00.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 20:07:14 +02:00
Scrum4Me Agent
12172eec95 feat(deploy): add sudoers config + setup.sh integration for systemctl_restart
/etc/sudoers.d/ops-agent grants NOPASSWD to ops-agent for the exact
systemctl restart invocations whitelisted in commands.yml.
setup.sh installs and validates it via visudo -c.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 17:53:09 +02:00
Scrum4Me Agent
92d450609c feat(auth): shared-secret auth web-app → ops-agent
- ops-agent/src/auth.ts: constant-time compare via timingSafeEqual to prevent timing attacks; store secret as Buffer
- ops-agent/src/index.ts + ops-agent.service: bind on 127.0.0.1:3099 (was 4242, per plan)
- app/api/agent/[...path]/route.ts: Next.js proxy route that verifies ops_session cookie then forwards requests to agent with Authorization: Bearer <secret>
- .env.example + deploy/ops-dashboard.env.example: add OPS_AGENT_SECRET and OPS_AGENT_URL
- README.md: rotation procedure for the shared secret

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 17:22:37 +02:00
Scrum4Me Agent
4bccbf28f3 feat: ops-agent Fastify service met SSE, whitelist en systemd-unit
- ops-agent/: Node.js Fastify+TypeScript service
  - GET /agent/v1/health
  - POST /agent/v1/exec → SSE stream (stdout/stderr/exit events)
  - Whitelist geladen uit /etc/ops-agent/commands.yml bij opstart
  - Auth via Bearer shared secret (/etc/ops-agent/secret, mode 0640)
  - Vier standaard commando's: docker_ps, git_status, systemctl_status,
    caddy_show_config
- deploy/ops-agent/ops-agent.service: systemd-unit (User=ops-agent,
  Restart=on-failure, StandardOutput=journal)
- deploy/ops-agent/setup.sh: aanmaken system-user, build, deploy,
  systemctl enable --now ops-agent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 17:15:44 +02:00
Scrum4Me Agent
ad9cde6fb7 feat: Dockerfile, deploy configs en Caddy-block voor ops.jp-visser.nl
- Multi-stage Dockerfile (deps → builder → runner) met next standalone output
- .dockerignore zodat node_modules en .next niet mee worden gebundeld
- next.config.ts: output standalone ingeschakeld voor minimale image
- deploy/docker-compose.ops-dashboard.yml: service-fragment voor /srv/scrum4me/compose
- deploy/caddy/Caddyfile.ops-dashboard: reverse_proxy block voor Caddy
- deploy/ops-dashboard.env.example: env-template voor /srv/ops/ops-dashboard.env

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 17:12:37 +02:00