520 lines
15 KiB
Markdown
520 lines
15 KiB
Markdown
# Bunker Ops — How-To Guide
|
|
|
|
Operational handbook for managing Changemaker Lite instances with Ansible.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Prerequisites](#1-prerequisites)
|
|
2. [Initial Setup (Control Machine)](#2-initial-setup-control-machine)
|
|
3. [Adding a New Instance](#3-adding-a-new-instance)
|
|
4. [Deploying an Instance](#4-deploying-an-instance)
|
|
5. [Day-to-Day Operations](#5-day-to-day-operations)
|
|
6. [Secret Management](#6-secret-management)
|
|
7. [Monitoring & Fleet Observability](#7-monitoring--fleet-observability)
|
|
8. [Troubleshooting](#8-troubleshooting)
|
|
9. [Variable Reference](#9-variable-reference)
|
|
|
|
---
|
|
|
|
## 1. Prerequisites
|
|
|
|
### Control Machine (your laptop / jump server)
|
|
|
|
- **Ansible 2.14+** — `pip install ansible` or `apt install ansible`
|
|
- **SSH access** — key-based auth to all target servers
|
|
- **OpenSSL** — for secret generation (`openssl rand`)
|
|
|
|
### Target Servers (each Changemaker instance)
|
|
|
|
- **Ubuntu 22.04 or 24.04** (Debian-based)
|
|
- **2+ GB RAM** (4 GB recommended; swap is auto-created on low-memory hosts)
|
|
- **20+ GB disk** (50 GB recommended for media features)
|
|
- **SSH access** for a `deploy` user with passwordless sudo
|
|
- **Outbound internet** (pulls Docker images, Git repo)
|
|
- Ports 80, 443, and SSH accessible
|
|
|
|
---
|
|
|
|
## 2. Initial Setup (Control Machine)
|
|
|
|
### 2.1 Clone the repository
|
|
|
|
```bash
|
|
git clone <repo-url> changemaker.lite
|
|
cd changemaker.lite/bunker-ops
|
|
```
|
|
|
|
### 2.2 Create a vault password
|
|
|
|
This single password encrypts all per-instance secrets. Store it securely (password manager, not Git).
|
|
|
|
```bash
|
|
# Generate a strong vault password
|
|
openssl rand -base64 32 > .vault_pass
|
|
chmod 600 .vault_pass
|
|
```
|
|
|
|
The `.vault_pass` file is in `.gitignore` and must never be committed.
|
|
|
|
### 2.3 Verify Ansible can run
|
|
|
|
```bash
|
|
ansible --version
|
|
ansible-playbook playbooks/deploy.yml --syntax-check
|
|
```
|
|
|
|
### 2.4 Prepare SSH access
|
|
|
|
Ensure your SSH key can reach target servers:
|
|
|
|
```bash
|
|
# Test connectivity
|
|
ssh deploy@10.0.1.10 "hostname && docker --version"
|
|
```
|
|
|
|
If you use a non-default SSH key:
|
|
|
|
```bash
|
|
# In ansible.cfg or per-host
|
|
ansible_ssh_private_key_file: ~/.ssh/bunker_ops_ed25519
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Adding a New Instance
|
|
|
|
### 3.1 Quick method (recommended)
|
|
|
|
The `add-instance.sh` script scaffolds everything:
|
|
|
|
```bash
|
|
./scripts/add-instance.sh edmonton-prod betteredmonton.org 10.0.1.10
|
|
|
|
# With fleet observability (Tier 2):
|
|
./scripts/add-instance.sh edmonton-prod betteredmonton.org 10.0.1.10 --tier 2
|
|
```
|
|
|
|
This creates:
|
|
- `inventory/host_vars/edmonton-prod/main.yml` — instance configuration
|
|
- `inventory/host_vars/edmonton-prod/vault.yml` — 19+ generated secrets (encrypted)
|
|
|
|
### 3.2 Add to inventory
|
|
|
|
Edit `inventory/hosts.yml` and add the host:
|
|
|
|
```yaml
|
|
all:
|
|
children:
|
|
changemaker_instances:
|
|
hosts:
|
|
edmonton-prod:
|
|
ansible_host: 10.0.1.10
|
|
ansible_user: deploy
|
|
cml_domain: betteredmonton.org
|
|
```
|
|
|
|
### 3.3 Customize configuration
|
|
|
|
Edit `inventory/host_vars/edmonton-prod/main.yml`:
|
|
|
|
```yaml
|
|
cml_domain: betteredmonton.org
|
|
cml_node_env: production
|
|
|
|
# Enable features
|
|
cml_enable_media: "true"
|
|
cml_listmonk_sync_enabled: "true"
|
|
cml_email_test_mode: "false"
|
|
cml_monitoring_enabled: true
|
|
|
|
# Production SMTP
|
|
cml_smtp_host: smtp.protonmail.ch
|
|
cml_smtp_port: 587
|
|
cml_smtp_user: "noreply@betteredmonton.org"
|
|
|
|
# Pangolin tunnel
|
|
cml_pangolin_api_url: "https://api.bnkserve.org/v1"
|
|
cml_pangolin_org_id: "org_abc123"
|
|
```
|
|
|
|
### 3.4 Edit secrets (if needed)
|
|
|
|
```bash
|
|
# Decrypt, edit, re-encrypt
|
|
ansible-vault edit inventory/host_vars/edmonton-prod/vault.yml
|
|
|
|
# Or set a specific value
|
|
ansible-vault decrypt inventory/host_vars/edmonton-prod/vault.yml
|
|
# ... edit ...
|
|
ansible-vault encrypt inventory/host_vars/edmonton-prod/vault.yml
|
|
```
|
|
|
|
### 3.5 Verify connectivity
|
|
|
|
```bash
|
|
ansible edmonton-prod -m ping
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Deploying an Instance
|
|
|
|
### 4.1 Full initial deploy
|
|
|
|
Installs Docker, configures the OS, clones the repo, generates `.env`, starts all containers, runs migrations, and sets up backup cron:
|
|
|
|
```bash
|
|
ansible-playbook playbooks/deploy.yml --limit edmonton-prod
|
|
```
|
|
|
|
What happens (in order):
|
|
1. **common** role — apt update, Docker install, UFW firewall, fail2ban, swap
|
|
2. **changemaker** role — git clone, create dirs, generate `.env`, `docker compose up`, Prisma migrations, seed, health checks, backup cron
|
|
3. **monitoring** role (if enabled) — Prometheus config, `--profile monitoring up`
|
|
|
|
### 4.2 Deploy all instances
|
|
|
|
```bash
|
|
# One at a time (safe):
|
|
ansible-playbook playbooks/deploy.yml
|
|
|
|
# Show what would change (dry run):
|
|
ansible-playbook playbooks/deploy.yml --check --diff
|
|
```
|
|
|
|
### 4.3 Deploy with specific tags
|
|
|
|
```bash
|
|
# Only regenerate .env (no Docker restart):
|
|
ansible-playbook playbooks/deploy.yml --limit edmonton-prod --tags env
|
|
|
|
# Only clone + update code:
|
|
ansible-playbook playbooks/deploy.yml --limit edmonton-prod --tags clone
|
|
|
|
# Only run health checks:
|
|
ansible-playbook playbooks/deploy.yml --limit edmonton-prod --tags health
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Day-to-Day Operations
|
|
|
|
### 5.1 Rolling upgrade (code + images)
|
|
|
|
Pulls latest Git commits, rebuilds images, runs migrations, restarts — in 25% batches:
|
|
|
|
```bash
|
|
# All instances:
|
|
ansible-playbook playbooks/upgrade.yml
|
|
|
|
# Single instance:
|
|
ansible-playbook playbooks/upgrade.yml --limit edmonton-prod
|
|
```
|
|
|
|
### 5.2 Configuration change (no rebuild)
|
|
|
|
Regenerates `.env` and restarts the API. Use when changing feature flags, SMTP settings, CORS origins, etc.:
|
|
|
|
```bash
|
|
# Change a variable first:
|
|
# Edit inventory/host_vars/edmonton-prod/main.yml
|
|
# e.g., cml_enable_media: "true"
|
|
|
|
# Then apply:
|
|
ansible-playbook playbooks/configure.yml --limit edmonton-prod
|
|
```
|
|
|
|
### 5.3 Trigger backups
|
|
|
|
```bash
|
|
# All instances:
|
|
ansible-playbook playbooks/backup.yml
|
|
|
|
# Single instance:
|
|
ansible-playbook playbooks/backup.yml --limit edmonton-prod
|
|
```
|
|
|
|
### 5.4 Enable/reconfigure monitoring
|
|
|
|
```bash
|
|
ansible-playbook playbooks/monitoring.yml --limit edmonton-prod
|
|
```
|
|
|
|
### 5.5 Run ad-hoc commands
|
|
|
|
```bash
|
|
# Check Docker status on all instances:
|
|
ansible changemaker_instances -m command -a "docker compose ps" --become
|
|
|
|
# View API logs on one instance:
|
|
ansible edmonton-prod -m command -a "docker compose logs api --tail 50" \
|
|
--become -e "chdir=/opt/changemaker-lite"
|
|
|
|
# Restart a specific service:
|
|
ansible edmonton-prod -m command -a "docker compose restart api" \
|
|
--become -e "chdir=/opt/changemaker-lite"
|
|
|
|
# Check disk space across fleet:
|
|
ansible changemaker_instances -m command -a "df -h /"
|
|
```
|
|
|
|
### 5.6 Rotate a secret
|
|
|
|
1. Generate a new value:
|
|
```bash
|
|
openssl rand -hex 32
|
|
```
|
|
2. Update the vault:
|
|
```bash
|
|
ansible-vault edit inventory/host_vars/edmonton-prod/vault.yml
|
|
# Change vault_cml_jwt_access_secret (or whichever secret)
|
|
```
|
|
3. Apply and restart:
|
|
```bash
|
|
ansible-playbook playbooks/configure.yml --limit edmonton-prod
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Secret Management
|
|
|
|
### Naming convention
|
|
|
|
| Prefix | Purpose | Example |
|
|
|--------|---------|---------|
|
|
| `cml_*` | Non-secret configuration | `cml_domain`, `cml_smtp_host` |
|
|
| `vault_cml_*` | Encrypted secrets | `vault_cml_v2_postgres_password` |
|
|
| `vault_bunker_*` | Bunker Ops shared secrets | `vault_bunker_ops_remote_write_token` |
|
|
|
|
### What gets encrypted
|
|
|
|
All 19+ secrets per instance:
|
|
- Database passwords (PostgreSQL, Redis, Listmonk DB, Gitea DB)
|
|
- JWT secrets (access + refresh) and encryption key
|
|
- Admin passwords (initial admin, NocoDB, n8n, Grafana, Gotify, Vaultwarden, Rocket.Chat, Gancio)
|
|
- API tokens (Listmonk API, Pangolin, Bunker Ops remote write)
|
|
- SMTP password
|
|
|
|
### Vault operations
|
|
|
|
```bash
|
|
# View encrypted file:
|
|
ansible-vault view inventory/host_vars/edmonton-prod/vault.yml
|
|
|
|
# Edit in-place (decrypts → opens $EDITOR → re-encrypts):
|
|
ansible-vault edit inventory/host_vars/edmonton-prod/vault.yml
|
|
|
|
# Re-key all vaults (change master password):
|
|
find inventory/host_vars -name vault.yml -exec ansible-vault rekey {} +
|
|
|
|
# Encrypt a new plaintext file:
|
|
ansible-vault encrypt inventory/host_vars/new-instance/vault.yml
|
|
```
|
|
|
|
### Vault password management
|
|
|
|
- The `.vault_pass` file is referenced in `ansible.cfg`
|
|
- For CI/CD, pass via environment: `ANSIBLE_VAULT_PASSWORD=... ansible-playbook ...`
|
|
- For teams, use `--vault-password-file` pointing to a shared secrets manager script
|
|
|
|
---
|
|
|
|
## 7. Monitoring & Fleet Observability
|
|
|
|
### Tier model
|
|
|
|
| Tier | What it means | How to set |
|
|
|------|--------------|-----------|
|
|
| **0: Standalone** | No Ansible management (manual `config.sh` install) | N/A |
|
|
| **1: Managed** | Ansible deploys/updates, local monitoring only | `bunker_ops_enabled: false` |
|
|
| **2: Fleet** | Ansible + metrics pushed to central VictoriaMetrics | `bunker_ops_enabled: true` |
|
|
|
|
### Enabling Tier 2 on an instance
|
|
|
|
1. Set in `host_vars/<hostname>/main.yml`:
|
|
```yaml
|
|
bunker_ops_enabled: true
|
|
bunker_ops_remote_write_url: "https://ops.bnkserve.org/api/v1/write"
|
|
cml_monitoring_enabled: true
|
|
```
|
|
2. Set the write token in `host_vars/<hostname>/vault.yml`:
|
|
```yaml
|
|
vault_bunker_ops_remote_write_token: "your-token-here"
|
|
```
|
|
3. Apply:
|
|
```bash
|
|
ansible-playbook playbooks/monitoring.yml --limit edmonton-prod
|
|
```
|
|
|
|
### What metrics are sent (Tier 2)
|
|
|
|
Only filtered, non-PII metrics leave the instance:
|
|
|
|
- `cm_*` — Application metrics (emails sent, canvass visits, queue sizes, login attempts)
|
|
- `node_*` — System metrics (CPU, memory, disk, network)
|
|
- `http_request*` — API latency and request counts
|
|
- `up` — Service availability
|
|
|
|
**Never sent:** Database content, user data, campaign text, participant records, cAdvisor container details.
|
|
|
|
### Backup metrics
|
|
|
|
When `BUNKER_OPS_ENABLED=true`, the backup script automatically pushes:
|
|
- `cm_backup_last_success_timestamp` — Unix timestamp of last successful backup
|
|
- `cm_backup_size_bytes` — Size of the backup archive
|
|
|
|
These enable "backup staleness" alerts on the central dashboard.
|
|
|
|
---
|
|
|
|
## 8. Troubleshooting
|
|
|
|
### Ansible can't connect
|
|
|
|
```
|
|
UNREACHABLE! => {"msg": "Failed to connect to the host via ssh"}
|
|
```
|
|
|
|
- Verify SSH: `ssh deploy@<host> hostname`
|
|
- Check `ansible_user` in hosts.yml matches the SSH user
|
|
- Ensure the user has passwordless sudo: `echo 'deploy ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/deploy`
|
|
|
|
### Vault password error
|
|
|
|
```
|
|
ERROR! Decryption failed on ...vault.yml
|
|
```
|
|
|
|
- Verify `.vault_pass` file exists and is correct
|
|
- Or pass explicitly: `ansible-playbook ... --vault-password-file /path/to/.vault_pass`
|
|
|
|
### Deploy fails at "Wait for PostgreSQL"
|
|
|
|
PostgreSQL hasn't started yet. Check:
|
|
|
|
```bash
|
|
ansible <host> -m command -a "docker compose logs v2-postgres --tail 30" \
|
|
--become -e "chdir=/opt/changemaker-lite"
|
|
```
|
|
|
|
Common causes:
|
|
- Disk full (`df -h`)
|
|
- Wrong `V2_POSTGRES_PASSWORD` (check vault.yml matches what's in the running DB)
|
|
- First deploy: PostgreSQL needs time to initialize
|
|
|
|
### Health check fails after deploy
|
|
|
|
API not responding on `/api/health`:
|
|
|
|
```bash
|
|
# Check if container is running:
|
|
ansible <host> -m command -a "docker compose ps api" --become -e "chdir=/opt/changemaker-lite"
|
|
|
|
# Check API logs:
|
|
ansible <host> -m command -a "docker compose logs api --tail 50" --become -e "chdir=/opt/changemaker-lite"
|
|
```
|
|
|
|
Common causes:
|
|
- Missing environment variable (check `.env` generation)
|
|
- Database migration failure (check Prisma output)
|
|
- Port conflict (another process on 4000)
|
|
|
|
### .env has wrong values
|
|
|
|
Compare generated `.env` with expected:
|
|
|
|
```bash
|
|
# Show diff of what Ansible would change:
|
|
ansible-playbook playbooks/configure.yml --limit <host> --check --diff
|
|
```
|
|
|
|
### Remote write not working (Tier 2)
|
|
|
|
```bash
|
|
# Check Prometheus config on instance:
|
|
ansible <host> -m command -a "cat /opt/changemaker-lite/configs/prometheus/prometheus.yml" --become
|
|
|
|
# Check Prometheus logs for remote write errors:
|
|
ansible <host> -m command -a "docker compose logs prometheus-changemaker --tail 30" \
|
|
--become -e "chdir=/opt/changemaker-lite"
|
|
```
|
|
|
|
Common issues:
|
|
- `bunker_ops_enabled` not set to `true`
|
|
- Wrong `bunker_ops_remote_write_url`
|
|
- Invalid auth token
|
|
- Central VictoriaMetrics not reachable (firewall, DNS)
|
|
|
|
---
|
|
|
|
## 9. Variable Reference
|
|
|
|
### Configuration variables (`cml_*`)
|
|
|
|
Set these in `host_vars/<hostname>/main.yml` or `group_vars/`.
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `cml_domain` | `cmlite.org` | Instance domain (drives CORS, SMTP, URLs) |
|
|
| `cml_node_env` | `production` | Node.js environment |
|
|
| `cml_api_port` | `4000` | Express API port |
|
|
| `cml_admin_port` | `3000` | React admin port |
|
|
| `cml_media_api_port` | `4100` | Fastify media API port |
|
|
| `cml_postgres_port` | `5433` | PostgreSQL host port |
|
|
| `cml_enable_media` | `"false"` | Enable video library |
|
|
| `cml_enable_payments` | `"false"` | Enable Stripe payments |
|
|
| `cml_enable_chat` | `"false"` | Enable Rocket.Chat |
|
|
| `cml_listmonk_sync_enabled` | `"false"` | Enable newsletter sync |
|
|
| `cml_gancio_sync_enabled` | `"false"` | Enable event sync |
|
|
| `cml_email_test_mode` | `"true"` | Use MailHog (`true`) or SMTP (`false`) |
|
|
| `cml_monitoring_enabled` | `false` | Enable Prometheus/Grafana stack |
|
|
| `cml_smtp_host` | `mailhog-changemaker` | SMTP server hostname |
|
|
| `cml_smtp_port` | `1025` | SMTP server port |
|
|
| `cml_smtp_user` | `""` | SMTP username |
|
|
| `cml_mapbox_api_key` | `""` | Mapbox geocoding key |
|
|
| `cml_google_maps_api_key` | `""` | Google Maps geocoding key |
|
|
| `cml_pangolin_api_url` | `""` | Pangolin tunnel API |
|
|
| `cml_pangolin_org_id` | `""` | Pangolin organization |
|
|
| `cml_backup_retention_days` | `30` | Days to keep local backups |
|
|
| `cml_backup_cron_hour` | `3` | Backup cron hour (UTC) |
|
|
| `cml_backup_s3_enabled` | `false` | Upload backups to S3 |
|
|
| `bunker_ops_enabled` | `false` | Enable fleet observability |
|
|
| `bunker_ops_instance_label` | `{{ cml_domain }}` | Label in central metrics |
|
|
| `bunker_ops_remote_write_url` | `""` | VictoriaMetrics write endpoint |
|
|
|
|
### Secret variables (`vault_cml_*`)
|
|
|
|
Set these in `host_vars/<hostname>/vault.yml` (encrypted).
|
|
|
|
| Variable | Purpose |
|
|
|----------|---------|
|
|
| `vault_cml_v2_postgres_password` | PostgreSQL password |
|
|
| `vault_cml_redis_password` | Redis authentication |
|
|
| `vault_cml_jwt_access_secret` | JWT access token signing (64-char hex) |
|
|
| `vault_cml_jwt_refresh_secret` | JWT refresh token signing (64-char hex) |
|
|
| `vault_cml_encryption_key` | Database field encryption (64-char hex) |
|
|
| `vault_cml_initial_admin_email` | Initial admin email |
|
|
| `vault_cml_initial_admin_password` | Initial admin password (12+ chars, complexity) |
|
|
| `vault_cml_listmonk_db_password` | Listmonk PostgreSQL password |
|
|
| `vault_cml_listmonk_web_admin_password` | Listmonk web UI password |
|
|
| `vault_cml_listmonk_api_token` | Listmonk API token |
|
|
| `vault_cml_nocodb_admin_password` | NocoDB admin password |
|
|
| `vault_cml_gitea_db_passwd` | Gitea database password |
|
|
| `vault_cml_gitea_db_root_password` | Gitea DB root password |
|
|
| `vault_cml_n8n_encryption_key` | n8n encryption key |
|
|
| `vault_cml_n8n_user_password` | n8n admin password |
|
|
| `vault_cml_grafana_admin_password` | Grafana admin password |
|
|
| `vault_cml_gotify_admin_password` | Gotify admin password |
|
|
| `vault_cml_vaultwarden_admin_token` | Vaultwarden admin token (64-char hex) |
|
|
| `vault_cml_rocketchat_admin_password` | Rocket.Chat admin password |
|
|
| `vault_cml_gancio_admin_password` | Gancio admin password |
|
|
| `vault_cml_smtp_pass` | SMTP password |
|
|
| `vault_cml_pangolin_api_key` | Pangolin API key |
|
|
| `vault_cml_pangolin_newt_id` | Pangolin Newt container ID |
|
|
| `vault_cml_pangolin_newt_secret` | Pangolin Newt secret |
|
|
| `vault_cml_pangolin_site_id` | Pangolin site ID |
|
|
| `vault_cml_pangolin_endpoint` | Pangolin endpoint URL |
|
|
| `vault_bunker_ops_remote_write_token` | Central VM write auth token |
|