Adds a server-wide backup capability beyond the existing ops_dashboard pg_dump flow: - Daily systemd timer (03:30) runs pg_dumpall + Forgejo dump, then restic to a local NAS repo and an offsite Backblaze B2 repo with Object Lock. Phase-based script with single-instance flock, structured statusfile, systemd hardening, and live-datadir excludes (Postgres / Forgejo) so the dumps stay authoritative. - Ops-agent gets nine new read-only/trigger commands (snapshots, stats, status, logs, plus two triggers) backed by sudoers-whitelisted wrapper scripts that source /etc/restic-backup.env so the agent never sees the restic password or B2 keys. - Two new flows (server_backup_full, server_backup_restore_test) drive the dashboard's "Backup now" and "Restore test" buttons. - /settings/backups gains a Server backup section with overall + per-phase status, NAS / B2 snapshot tables, restore-size / raw-data / dedup-ratio stats, and the last restore-test result. The existing pg_dump section is preserved unchanged. - Runbook docs/runbooks/server-backup.md follows the tailscale-setup pattern (plan + addendum) and covers B2 Object Lock + scoped keys, Forgejo subplan with isolated restore-test stack, the off-server maintenance flow for B2 prune, and the integrity-check schedule. Code-only change — installation on scrum4me-srv follows the runbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.5 KiB
Server backup — deploy artefacten
Dagelijkse server-brede backup met restic naar NAS (lokaal) en Backblaze B2 (offsite, Object Lock). Inclusief structured statusfile die de ops-dashboard kan lezen.
De volledige beschrijving — voorwaarden, B2 keys, Object Lock, Forgejo-restore-test, integriteits-schedule — staat in docs/runbooks/server-backup.md.
Bestanden
| Bestand | Doel | Plek op host |
|---|---|---|
server-backup.sh |
hoofd-script (phase-based, flock, statusfile) | /srv/backups/scripts/server-backup.sh |
restore-test.sh |
restore latest snapshot + check critical files | /srv/backups/scripts/restore-test.sh |
server-backup.service |
systemd oneshot | /etc/systemd/system/server-backup.service |
server-backup.timer |
daily 03:30 + 10 min jitter | /etc/systemd/system/server-backup.timer |
restic-backup.env.example |
env-template (repos, B2 keys, Forgejo) | kopiëren naar /etc/restic-backup.env |
Bovendien aan te maken (niet in deze repo, omdat het secrets zijn):
/etc/restic-backup.password— alleen het restic-wachtwoord (mode0400 root:root).
Snelle installatie (zie runbook voor alle context)
# 1. Tools en directories
sudo apt update && sudo apt install -y restic jq
sudo mkdir -p /srv/backups/scripts /srv/backups/logs /srv/backups/status \
/var/backups/databases
sudo chmod 0750 /srv/backups/logs /srv/backups/status
# 2. Scripts plaatsen
sudo cp deploy/server-backup/server-backup.sh /srv/backups/scripts/
sudo cp deploy/server-backup/restore-test.sh /srv/backups/scripts/
sudo chmod 0750 /srv/backups/scripts/*.sh
sudo chown root:root /srv/backups/scripts/*.sh
# 3. Env + password
sudo cp deploy/server-backup/restic-backup.env.example /etc/restic-backup.env
sudo chmod 0600 /etc/restic-backup.env
sudo chown root:root /etc/restic-backup.env
# Genereer wachtwoord — bewaar dit OOK in je password manager.
sudo sh -c 'openssl rand -hex 24 > /etc/restic-backup.password'
sudo chmod 0400 /etc/restic-backup.password
# 4. Vul /etc/restic-backup.env (RESTIC_REPO_NAS, RESTIC_REPO_B2,
# B2_ACCOUNT_ID, B2_ACCOUNT_KEY, FORGEJO_*). Zie runbook deel A+B.
# 5. Repos initialiseren (zie runbook deel C voor Object Lock + key-capabilities)
sudo -E bash -c 'set -a; . /etc/restic-backup.env; set +a; \
export RESTIC_PASSWORD_FILE=/etc/restic-backup.password; \
restic -r "$RESTIC_REPO_NAS" init && \
restic -r "$RESTIC_REPO_B2" init'
# 6. Systemd
sudo cp deploy/server-backup/server-backup.service /etc/systemd/system/
sudo cp deploy/server-backup/server-backup.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now server-backup.timer
systemctl list-timers | grep server-backup
# 7. Eerste run handmatig (volgen via journalctl)
sudo systemctl start server-backup.service
journalctl -u server-backup.service -f
Verifiëren
# Statusfile
sudo jq . /srv/backups/status/last-run.json
# Snapshots
sudo -E bash -c 'set -a; . /etc/restic-backup.env; set +a; \
export RESTIC_PASSWORD_FILE=/etc/restic-backup.password; \
restic -r "$RESTIC_REPO_NAS" snapshots; \
restic -r "$RESTIC_REPO_B2" snapshots'
# Restore-test (NAS, niet-destructief — restored naar /tmp/restore-test)
sudo /srv/backups/scripts/restore-test.sh nas
sudo jq . /srv/backups/status/last-restore-test.json
Statusfile-schema
Het script schrijft /srv/backups/status/last-run.json na elke run (success of failure), atomisch via temp + mv. De ops-dashboard leest deze file via read_backup_status (zie ops-agent/commands.yml.example).
{
"schema_version": 1,
"overall_status": "success | partial_failure | failed",
"started_at": "2026-05-15T03:30:00+02:00",
"completed_at": "2026-05-15T03:48:21+02:00",
"duration_seconds": 1101,
"host": "scrum4me-srv",
"phases": {
"postgres_dump": { "status": "success", "exit_code": 0, "...": "..." },
"forgejo_dump": { "status": "skipped", "exit_code": 99, "...": "..." },
"forgejo_db_dump": { "status": "skipped", "exit_code": 99 },
"restic_nas": { "status": "success", "exit_code": 0, "snapshot_id": "abc123" },
"restic_b2": { "status": "degraded", "exit_code": 3, "error": "1 file unreadable" },
"forget_nas": { "status": "success", "exit_code": 0 },
"check_nas": { "status": "success", "exit_code": 0 },
"check_b2": { "status": "success", "exit_code": 0 }
}
}
Per phase status:
| status | betekenis | telt mee als |
|---|---|---|
success |
exit 0 | success |
skipped |
exit 99 — phase niet van toepassing (bv. Forgejo niet geïnstalleerd) | success |
degraded |
exit 3 — restic snapshot is gemaakt maar bepaalde files waren onleesbaar | partial_failure |
failed |
andere non-zero exit | partial_failure of failed (zie overall_status) |
pending |
phase niet gerund (script aborted vóór deze phase) | partial_failure |
overall_status regels:
failedalspostgres_dumpfaalt (DB-dump is autoritatief), of als beide restic repos falen.partial_failurebij enigefailedofdegradedphase die niet kritisch is (bv. één restic repo down, of forgejo_dump faalt terwijl postgres lukt).successals geen enkele phasefailedofdegradedis.
Volgorde tov bestaande ops-db-backup.timer
De bestaande deploy/ops-agent/ops-db-backup.timer draait om 02:00 en doet alleen pg_dump ops_dashboard naar /srv/ops/backups/. Deze nieuwe server-backup.timer draait om 03:30 en pickt die map mee in zijn restic-backup. Beide blijven naast elkaar bestaan.