Ops-dashboard/README.md
Scrum4Me Agent caeb5f3306 feat(ops): self-update script, systemd units, README install guide, recovery runbook
- deploy/ops-dashboard-updater/update.sh: git pull → docker build → force-recreate → smoke-test
- deploy/ops-dashboard-updater/install.sh: installs script + systemd units to host
- ops-dashboard-updater.service / .timer: oneshot + daily 03:00 scheduled trigger
- README.md: Installation and Configuration sections (env files, ops-agent, updater)
- docs/runbooks/recovery.md: agent-crash, DB corruption/restore, container failure, cert expiry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 20:10:21 +02:00

107 lines
3.3 KiB
Markdown

# Ops Dashboard
Single-user ops dashboard voor jp-visser.nl.
See `docs/runbooks/` for setup, deployment, and operational procedures.
## Installation
### Prerequisites
- Docker + Docker Compose (plugin) installed on the host
- A PostgreSQL service named `postgres` already running in the same Compose stack
- The repository cloned to `/srv/ops/repos/ops-dashboard`
- `/srv/scrum4me/compose/docker-compose.yml` as the shared Compose file
### 1. Configure environment
```
cp deploy/ops-dashboard.env.example /srv/ops/ops-dashboard.env
# Edit /srv/ops/ops-dashboard.env — set DATABASE_URL, AUTH_SECRET, etc.
```
### 2. Install ops-agent
```
sudo deploy/ops-agent/setup.sh
```
This creates the `ops-agent` system user, installs `/opt/ops-agent`, generates
`/etc/ops-agent/secret`, and enables the systemd unit.
Copy the generated secret into the web-app env file:
```
sudo cat /etc/ops-agent/secret
# Paste the value as OPS_AGENT_SECRET= in /srv/ops/ops-dashboard.env
```
### 3. Build and start the dashboard
```
sudo docker compose -f /srv/scrum4me/compose/docker-compose.yml build ops-dashboard
sudo docker compose -f /srv/scrum4me/compose/docker-compose.yml up -d ops-dashboard
```
The dashboard is now reachable on `127.0.0.1:3001` (proxied by Caddy).
### 4. Install the self-update script
```
sudo deploy/ops-dashboard-updater/install.sh
```
To enable scheduled updates (daily at 03:00):
```
sudo systemctl enable --now ops-dashboard-updater.timer
```
To trigger a manual update via SSH:
```
sudo systemctl start ops-dashboard-updater.service
# or:
sudo /opt/ops-dashboard-updater/update.sh
```
> **Never** trigger updates through the dashboard UI — the script restarts the
> container that serves the UI.
## Configuration
| File | Purpose |
|---|---|
| `/srv/ops/ops-dashboard.env` | Web-app environment (DATABASE_URL, AUTH_SECRET, OPS_AGENT_SECRET, …) |
| `/etc/ops-agent/secret` | Shared HMAC secret between web-app and ops-agent |
| `/etc/ops-agent/commands.yml` | Whitelist of commands the ops-agent may run |
| `/etc/ops-agent/flows/` | Flow YAML files (backup, caddy reload, etc.) |
| `/srv/scrum4me/compose/docker-compose.yml` | Main Compose file (add ops-dashboard fragment from `deploy/`) |
## Ops-agent auth
The web-app communicates with the ops-agent via a shared secret stored in
`/etc/ops-agent/secret` (mode 0640, owner `root:ops-agent`).
- The ops-agent reads the secret at startup via `OPS_AGENT_SECRET_PATH`.
- Every request from the web-app carries `Authorization: Bearer <secret>`.
- The agent validates using a constant-time comparison to prevent timing attacks.
- The web-app reads the secret value from the `OPS_AGENT_SECRET` environment variable.
### Secret rotation procedure
1. Generate a new secret on the server:
```
openssl rand -hex 32 | sudo tee /etc/ops-agent/secret
sudo chown root:ops-agent /etc/ops-agent/secret
sudo chmod 0640 /etc/ops-agent/secret
```
2. Update `OPS_AGENT_SECRET` in the web-app's environment file
(`/srv/ops/ops-dashboard.env`) with the new value.
3. Restart both services:
```
sudo systemctl restart ops-agent
sudo docker compose -f /srv/ops/docker-compose.ops-dashboard.yml restart ops-dashboard
```
4. Verify the dashboard is operational and that `systemctl status ops-agent` shows
the service running without errors.