2026-02-18 17:15:31 -07:00

520 lines
15 KiB
Markdown

# Bunker Ops — How-To Guide
Operational handbook for managing Changemaker Lite instances with Ansible.
---
## Table of Contents
1. [Prerequisites](#1-prerequisites)
2. [Initial Setup (Control Machine)](#2-initial-setup-control-machine)
3. [Adding a New Instance](#3-adding-a-new-instance)
4. [Deploying an Instance](#4-deploying-an-instance)
5. [Day-to-Day Operations](#5-day-to-day-operations)
6. [Secret Management](#6-secret-management)
7. [Monitoring & Fleet Observability](#7-monitoring--fleet-observability)
8. [Troubleshooting](#8-troubleshooting)
9. [Variable Reference](#9-variable-reference)
---
## 1. Prerequisites
### Control Machine (your laptop / jump server)
- **Ansible 2.14+** — `pip install ansible` or `apt install ansible`
- **SSH access** — key-based auth to all target servers
- **OpenSSL** — for secret generation (`openssl rand`)
### Target Servers (each Changemaker instance)
- **Ubuntu 22.04 or 24.04** (Debian-based)
- **2+ GB RAM** (4 GB recommended; swap is auto-created on low-memory hosts)
- **20+ GB disk** (50 GB recommended for media features)
- **SSH access** for a `deploy` user with passwordless sudo
- **Outbound internet** (pulls Docker images, Git repo)
- Ports 80, 443, and SSH accessible
---
## 2. Initial Setup (Control Machine)
### 2.1 Clone the repository
```bash
git clone <repo-url> changemaker.lite
cd changemaker.lite/bunker-ops
```
### 2.2 Create a vault password
This single password encrypts all per-instance secrets. Store it securely (password manager, not Git).
```bash
# Generate a strong vault password
openssl rand -base64 32 > .vault_pass
chmod 600 .vault_pass
```
The `.vault_pass` file is in `.gitignore` and must never be committed.
### 2.3 Verify Ansible can run
```bash
ansible --version
ansible-playbook playbooks/deploy.yml --syntax-check
```
### 2.4 Prepare SSH access
Ensure your SSH key can reach target servers:
```bash
# Test connectivity
ssh deploy@10.0.1.10 "hostname && docker --version"
```
If you use a non-default SSH key:
```bash
# In ansible.cfg or per-host
ansible_ssh_private_key_file: ~/.ssh/bunker_ops_ed25519
```
---
## 3. Adding a New Instance
### 3.1 Quick method (recommended)
The `add-instance.sh` script scaffolds everything:
```bash
./scripts/add-instance.sh edmonton-prod betteredmonton.org 10.0.1.10
# With fleet observability (Tier 2):
./scripts/add-instance.sh edmonton-prod betteredmonton.org 10.0.1.10 --tier 2
```
This creates:
- `inventory/host_vars/edmonton-prod/main.yml` — instance configuration
- `inventory/host_vars/edmonton-prod/vault.yml` — 19+ generated secrets (encrypted)
### 3.2 Add to inventory
Edit `inventory/hosts.yml` and add the host:
```yaml
all:
children:
changemaker_instances:
hosts:
edmonton-prod:
ansible_host: 10.0.1.10
ansible_user: deploy
cml_domain: betteredmonton.org
```
### 3.3 Customize configuration
Edit `inventory/host_vars/edmonton-prod/main.yml`:
```yaml
cml_domain: betteredmonton.org
cml_node_env: production
# Enable features
cml_enable_media: "true"
cml_listmonk_sync_enabled: "true"
cml_email_test_mode: "false"
cml_monitoring_enabled: true
# Production SMTP
cml_smtp_host: smtp.protonmail.ch
cml_smtp_port: 587
cml_smtp_user: "noreply@betteredmonton.org"
# Pangolin tunnel
cml_pangolin_api_url: "https://api.bnkserve.org/v1"
cml_pangolin_org_id: "org_abc123"
```
### 3.4 Edit secrets (if needed)
```bash
# Decrypt, edit, re-encrypt
ansible-vault edit inventory/host_vars/edmonton-prod/vault.yml
# Or set a specific value
ansible-vault decrypt inventory/host_vars/edmonton-prod/vault.yml
# ... edit ...
ansible-vault encrypt inventory/host_vars/edmonton-prod/vault.yml
```
### 3.5 Verify connectivity
```bash
ansible edmonton-prod -m ping
```
---
## 4. Deploying an Instance
### 4.1 Full initial deploy
Installs Docker, configures the OS, clones the repo, generates `.env`, starts all containers, runs migrations, and sets up backup cron:
```bash
ansible-playbook playbooks/deploy.yml --limit edmonton-prod
```
What happens (in order):
1. **common** role — apt update, Docker install, UFW firewall, fail2ban, swap
2. **changemaker** role — git clone, create dirs, generate `.env`, `docker compose up`, Prisma migrations, seed, health checks, backup cron
3. **monitoring** role (if enabled) — Prometheus config, `--profile monitoring up`
### 4.2 Deploy all instances
```bash
# One at a time (safe):
ansible-playbook playbooks/deploy.yml
# Show what would change (dry run):
ansible-playbook playbooks/deploy.yml --check --diff
```
### 4.3 Deploy with specific tags
```bash
# Only regenerate .env (no Docker restart):
ansible-playbook playbooks/deploy.yml --limit edmonton-prod --tags env
# Only clone + update code:
ansible-playbook playbooks/deploy.yml --limit edmonton-prod --tags clone
# Only run health checks:
ansible-playbook playbooks/deploy.yml --limit edmonton-prod --tags health
```
---
## 5. Day-to-Day Operations
### 5.1 Rolling upgrade (code + images)
Pulls latest Git commits, rebuilds images, runs migrations, restarts — in 25% batches:
```bash
# All instances:
ansible-playbook playbooks/upgrade.yml
# Single instance:
ansible-playbook playbooks/upgrade.yml --limit edmonton-prod
```
### 5.2 Configuration change (no rebuild)
Regenerates `.env` and restarts the API. Use when changing feature flags, SMTP settings, CORS origins, etc.:
```bash
# Change a variable first:
# Edit inventory/host_vars/edmonton-prod/main.yml
# e.g., cml_enable_media: "true"
# Then apply:
ansible-playbook playbooks/configure.yml --limit edmonton-prod
```
### 5.3 Trigger backups
```bash
# All instances:
ansible-playbook playbooks/backup.yml
# Single instance:
ansible-playbook playbooks/backup.yml --limit edmonton-prod
```
### 5.4 Enable/reconfigure monitoring
```bash
ansible-playbook playbooks/monitoring.yml --limit edmonton-prod
```
### 5.5 Run ad-hoc commands
```bash
# Check Docker status on all instances:
ansible changemaker_instances -m command -a "docker compose ps" --become
# View API logs on one instance:
ansible edmonton-prod -m command -a "docker compose logs api --tail 50" \
--become -e "chdir=/opt/changemaker-lite"
# Restart a specific service:
ansible edmonton-prod -m command -a "docker compose restart api" \
--become -e "chdir=/opt/changemaker-lite"
# Check disk space across fleet:
ansible changemaker_instances -m command -a "df -h /"
```
### 5.6 Rotate a secret
1. Generate a new value:
```bash
openssl rand -hex 32
```
2. Update the vault:
```bash
ansible-vault edit inventory/host_vars/edmonton-prod/vault.yml
# Change vault_cml_jwt_access_secret (or whichever secret)
```
3. Apply and restart:
```bash
ansible-playbook playbooks/configure.yml --limit edmonton-prod
```
---
## 6. Secret Management
### Naming convention
| Prefix | Purpose | Example |
|--------|---------|---------|
| `cml_*` | Non-secret configuration | `cml_domain`, `cml_smtp_host` |
| `vault_cml_*` | Encrypted secrets | `vault_cml_v2_postgres_password` |
| `vault_bunker_*` | Bunker Ops shared secrets | `vault_bunker_ops_remote_write_token` |
### What gets encrypted
All 19+ secrets per instance:
- Database passwords (PostgreSQL, Redis, Listmonk DB, Gitea DB)
- JWT secrets (access + refresh) and encryption key
- Admin passwords (initial admin, NocoDB, n8n, Grafana, Gotify, Vaultwarden, Rocket.Chat, Gancio)
- API tokens (Listmonk API, Pangolin, Bunker Ops remote write)
- SMTP password
### Vault operations
```bash
# View encrypted file:
ansible-vault view inventory/host_vars/edmonton-prod/vault.yml
# Edit in-place (decrypts → opens $EDITOR → re-encrypts):
ansible-vault edit inventory/host_vars/edmonton-prod/vault.yml
# Re-key all vaults (change master password):
find inventory/host_vars -name vault.yml -exec ansible-vault rekey {} +
# Encrypt a new plaintext file:
ansible-vault encrypt inventory/host_vars/new-instance/vault.yml
```
### Vault password management
- The `.vault_pass` file is referenced in `ansible.cfg`
- For CI/CD, pass via environment: `ANSIBLE_VAULT_PASSWORD=... ansible-playbook ...`
- For teams, use `--vault-password-file` pointing to a shared secrets manager script
---
## 7. Monitoring & Fleet Observability
### Tier model
| Tier | What it means | How to set |
|------|--------------|-----------|
| **0: Standalone** | No Ansible management (manual `config.sh` install) | N/A |
| **1: Managed** | Ansible deploys/updates, local monitoring only | `bunker_ops_enabled: false` |
| **2: Fleet** | Ansible + metrics pushed to central VictoriaMetrics | `bunker_ops_enabled: true` |
### Enabling Tier 2 on an instance
1. Set in `host_vars/<hostname>/main.yml`:
```yaml
bunker_ops_enabled: true
bunker_ops_remote_write_url: "https://ops.bnkserve.org/api/v1/write"
cml_monitoring_enabled: true
```
2. Set the write token in `host_vars/<hostname>/vault.yml`:
```yaml
vault_bunker_ops_remote_write_token: "your-token-here"
```
3. Apply:
```bash
ansible-playbook playbooks/monitoring.yml --limit edmonton-prod
```
### What metrics are sent (Tier 2)
Only filtered, non-PII metrics leave the instance:
- `cm_*` — Application metrics (emails sent, canvass visits, queue sizes, login attempts)
- `node_*` — System metrics (CPU, memory, disk, network)
- `http_request*` — API latency and request counts
- `up` — Service availability
**Never sent:** Database content, user data, campaign text, participant records, cAdvisor container details.
### Backup metrics
When `BUNKER_OPS_ENABLED=true`, the backup script automatically pushes:
- `cm_backup_last_success_timestamp` — Unix timestamp of last successful backup
- `cm_backup_size_bytes` — Size of the backup archive
These enable "backup staleness" alerts on the central dashboard.
---
## 8. Troubleshooting
### Ansible can't connect
```
UNREACHABLE! => {"msg": "Failed to connect to the host via ssh"}
```
- Verify SSH: `ssh deploy@<host> hostname`
- Check `ansible_user` in hosts.yml matches the SSH user
- Ensure the user has passwordless sudo: `echo 'deploy ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/deploy`
### Vault password error
```
ERROR! Decryption failed on ...vault.yml
```
- Verify `.vault_pass` file exists and is correct
- Or pass explicitly: `ansible-playbook ... --vault-password-file /path/to/.vault_pass`
### Deploy fails at "Wait for PostgreSQL"
PostgreSQL hasn't started yet. Check:
```bash
ansible <host> -m command -a "docker compose logs v2-postgres --tail 30" \
--become -e "chdir=/opt/changemaker-lite"
```
Common causes:
- Disk full (`df -h`)
- Wrong `V2_POSTGRES_PASSWORD` (check vault.yml matches what's in the running DB)
- First deploy: PostgreSQL needs time to initialize
### Health check fails after deploy
API not responding on `/api/health`:
```bash
# Check if container is running:
ansible <host> -m command -a "docker compose ps api" --become -e "chdir=/opt/changemaker-lite"
# Check API logs:
ansible <host> -m command -a "docker compose logs api --tail 50" --become -e "chdir=/opt/changemaker-lite"
```
Common causes:
- Missing environment variable (check `.env` generation)
- Database migration failure (check Prisma output)
- Port conflict (another process on 4000)
### .env has wrong values
Compare generated `.env` with expected:
```bash
# Show diff of what Ansible would change:
ansible-playbook playbooks/configure.yml --limit <host> --check --diff
```
### Remote write not working (Tier 2)
```bash
# Check Prometheus config on instance:
ansible <host> -m command -a "cat /opt/changemaker-lite/configs/prometheus/prometheus.yml" --become
# Check Prometheus logs for remote write errors:
ansible <host> -m command -a "docker compose logs prometheus-changemaker --tail 30" \
--become -e "chdir=/opt/changemaker-lite"
```
Common issues:
- `bunker_ops_enabled` not set to `true`
- Wrong `bunker_ops_remote_write_url`
- Invalid auth token
- Central VictoriaMetrics not reachable (firewall, DNS)
---
## 9. Variable Reference
### Configuration variables (`cml_*`)
Set these in `host_vars/<hostname>/main.yml` or `group_vars/`.
| Variable | Default | Description |
|----------|---------|-------------|
| `cml_domain` | `cmlite.org` | Instance domain (drives CORS, SMTP, URLs) |
| `cml_node_env` | `production` | Node.js environment |
| `cml_api_port` | `4000` | Express API port |
| `cml_admin_port` | `3000` | React admin port |
| `cml_media_api_port` | `4100` | Fastify media API port |
| `cml_postgres_port` | `5433` | PostgreSQL host port |
| `cml_enable_media` | `"false"` | Enable video library |
| `cml_enable_payments` | `"false"` | Enable Stripe payments |
| `cml_enable_chat` | `"false"` | Enable Rocket.Chat |
| `cml_listmonk_sync_enabled` | `"false"` | Enable newsletter sync |
| `cml_gancio_sync_enabled` | `"false"` | Enable event sync |
| `cml_email_test_mode` | `"true"` | Use MailHog (`true`) or SMTP (`false`) |
| `cml_monitoring_enabled` | `false` | Enable Prometheus/Grafana stack |
| `cml_smtp_host` | `mailhog-changemaker` | SMTP server hostname |
| `cml_smtp_port` | `1025` | SMTP server port |
| `cml_smtp_user` | `""` | SMTP username |
| `cml_mapbox_api_key` | `""` | Mapbox geocoding key |
| `cml_google_maps_api_key` | `""` | Google Maps geocoding key |
| `cml_pangolin_api_url` | `""` | Pangolin tunnel API |
| `cml_pangolin_org_id` | `""` | Pangolin organization |
| `cml_backup_retention_days` | `30` | Days to keep local backups |
| `cml_backup_cron_hour` | `3` | Backup cron hour (UTC) |
| `cml_backup_s3_enabled` | `false` | Upload backups to S3 |
| `bunker_ops_enabled` | `false` | Enable fleet observability |
| `bunker_ops_instance_label` | `{{ cml_domain }}` | Label in central metrics |
| `bunker_ops_remote_write_url` | `""` | VictoriaMetrics write endpoint |
### Secret variables (`vault_cml_*`)
Set these in `host_vars/<hostname>/vault.yml` (encrypted).
| Variable | Purpose |
|----------|---------|
| `vault_cml_v2_postgres_password` | PostgreSQL password |
| `vault_cml_redis_password` | Redis authentication |
| `vault_cml_jwt_access_secret` | JWT access token signing (64-char hex) |
| `vault_cml_jwt_refresh_secret` | JWT refresh token signing (64-char hex) |
| `vault_cml_encryption_key` | Database field encryption (64-char hex) |
| `vault_cml_initial_admin_email` | Initial admin email |
| `vault_cml_initial_admin_password` | Initial admin password (12+ chars, complexity) |
| `vault_cml_listmonk_db_password` | Listmonk PostgreSQL password |
| `vault_cml_listmonk_web_admin_password` | Listmonk web UI password |
| `vault_cml_listmonk_api_token` | Listmonk API token |
| `vault_cml_nocodb_admin_password` | NocoDB admin password |
| `vault_cml_gitea_db_passwd` | Gitea database password |
| `vault_cml_gitea_db_root_password` | Gitea DB root password |
| `vault_cml_n8n_encryption_key` | n8n encryption key |
| `vault_cml_n8n_user_password` | n8n admin password |
| `vault_cml_grafana_admin_password` | Grafana admin password |
| `vault_cml_gotify_admin_password` | Gotify admin password |
| `vault_cml_vaultwarden_admin_token` | Vaultwarden admin token (64-char hex) |
| `vault_cml_rocketchat_admin_password` | Rocket.Chat admin password |
| `vault_cml_gancio_admin_password` | Gancio admin password |
| `vault_cml_smtp_pass` | SMTP password |
| `vault_cml_pangolin_api_key` | Pangolin API key |
| `vault_cml_pangolin_newt_id` | Pangolin Newt container ID |
| `vault_cml_pangolin_newt_secret` | Pangolin Newt secret |
| `vault_cml_pangolin_site_id` | Pangolin site ID |
| `vault_cml_pangolin_endpoint` | Pangolin endpoint URL |
| `vault_bunker_ops_remote_write_token` | Central VM write auth token |