feat(server-backup): restic dual-repo backup (NAS + B2) with dashboard UI
Adds a server-wide backup capability beyond the existing ops_dashboard pg_dump flow: - Daily systemd timer (03:30) runs pg_dumpall + Forgejo dump, then restic to a local NAS repo and an offsite Backblaze B2 repo with Object Lock. Phase-based script with single-instance flock, structured statusfile, systemd hardening, and live-datadir excludes (Postgres / Forgejo) so the dumps stay authoritative. - Ops-agent gets nine new read-only/trigger commands (snapshots, stats, status, logs, plus two triggers) backed by sudoers-whitelisted wrapper scripts that source /etc/restic-backup.env so the agent never sees the restic password or B2 keys. - Two new flows (server_backup_full, server_backup_restore_test) drive the dashboard's "Backup now" and "Restore test" buttons. - /settings/backups gains a Server backup section with overall + per-phase status, NAS / B2 snapshot tables, restore-size / raw-data / dedup-ratio stats, and the last restore-test result. The existing pg_dump section is preserved unchanged. - Runbook docs/runbooks/server-backup.md follows the tailscale-setup pattern (plan + addendum) and covers B2 Object Lock + scoped keys, Forgejo subplan with isolated restore-test stack, the off-server maintenance flow for B2 prune, and the integrity-check schedule. Code-only change — installation on scrum4me-srv follows the runbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
27cba872a8
commit
ab87c0fada
23 changed files with 2625 additions and 170 deletions
21
ops-agent/flows.example/server_backup_full.yml
Normal file
21
ops-agent/flows.example/server_backup_full.yml
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# Trigger a full server-wide backup (pg_dumpall + restic to NAS + B2).
|
||||
# Runs out-of-band via systemd; this flow just kicks it off and then tails
|
||||
# today's log + reads the structured statusfile so the dashboard can render
|
||||
# progress and final result.
|
||||
#
|
||||
# Copy to /etc/ops-agent/flows/server_backup_full.yml on the host.
|
||||
# Triggered manually via /settings/backups → "Backup now" or by the daily
|
||||
# server-backup.timer (which runs server-backup.service directly, skipping
|
||||
# this flow).
|
||||
|
||||
name: Server backup (full)
|
||||
description: Daily full server backup — pg_dumpall + restic to NAS + B2 (Object Lock)
|
||||
steps:
|
||||
- command_key: trigger_server_backup
|
||||
on_failure: abort
|
||||
|
||||
- command_key: tail_backup_log_today
|
||||
on_failure: continue
|
||||
|
||||
- command_key: read_backup_status
|
||||
on_failure: continue
|
||||
14
ops-agent/flows.example/server_backup_restore_test.yml
Normal file
14
ops-agent/flows.example/server_backup_restore_test.yml
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
# Run a non-destructive restore test against the NAS repo. Restores the latest
|
||||
# snapshot to /tmp/restore-test/ and asserts that critical files came back
|
||||
# intact. Used to verify backups periodically without touching the live stack.
|
||||
#
|
||||
# Copy to /etc/ops-agent/flows/server_backup_restore_test.yml on the host.
|
||||
|
||||
name: Server backup — restore test
|
||||
description: Restore latest snapshot to /tmp/restore-test and assert critical files
|
||||
steps:
|
||||
- command_key: trigger_restore_test
|
||||
on_failure: continue
|
||||
|
||||
- command_key: read_backup_status
|
||||
on_failure: continue
|
||||
Loading…
Add table
Add a link
Reference in a new issue