670 lines
13 KiB
Markdown

# Backup & Restore Procedures
## Overview
The `scripts/backup.sh` script provides automated backups of:
- V2 PostgreSQL database (pg_dump)
- Listmonk PostgreSQL database (pg_dump)
- Uploads directory (tar.gz)
- Backup manifest (SHA256 checksums)
**Optional S3 upload** for offsite storage.
---
## Quick Start
### Manual Backup
```bash
# Basic backup (local only)
./scripts/backup.sh
# With S3 upload
./scripts/backup.sh --s3
# Custom retention (60 days)
./scripts/backup.sh --retention 60
```
**Output**: `backups/changemaker-v2-backup-YYYYMMDD_HHMMSS.tar.gz`
---
### Automated Backups (Cron)
```bash
# Edit crontab
crontab -e
# Daily backup at 2 AM + S3 upload
0 2 * * * /home/user/changemaker.lite/scripts/backup.sh --s3 >> /var/log/changemaker-backup.log 2>&1
# Weekly backup on Sundays at 3 AM
0 3 * * 0 /home/user/changemaker.lite/scripts/backup.sh --s3 --retention 90
```
---
## Backup Script Walkthrough
### Configuration
**Location**: `scripts/backup.sh`
**Variables**:
```bash
BACKUP_DIR="${BACKUP_DIR:-./backups}" # Backup output directory
RETENTION_DAYS="${RETENTION_DAYS:-30}" # Delete backups older than N days
TIMESTAMP="$(date +%Y%m%d_%H%M%S)" # Backup timestamp
```
**Environment**: Loads `.env` automatically (safe parsing handles quotes/special chars).
---
### Backup Steps
#### 1. V2 PostgreSQL Dump
```bash
docker exec changemaker-v2-postgres \
pg_dump -U changemaker -d changemaker_v2 --no-owner --no-acl \
| gzip > v2-postgres.sql.gz
```
**Options**:
- `--no-owner`: Skip ownership commands (easier restore)
- `--no-acl`: Skip permissions (easier restore)
- `gzip`: Compress (70-80% reduction)
**Size estimate**: 100MB-2GB (depends on data volume).
---
#### 2. Listmonk PostgreSQL Dump
```bash
docker exec listmonk-db \
pg_dump -U listmonk -d listmonk --no-owner --no-acl \
| gzip > listmonk-postgres.sql.gz
```
**Optional**: Skipped if Listmonk container not running.
**Size estimate**: 10MB-500MB (depends on subscriber count + campaigns).
---
#### 3. Uploads Archive
```bash
tar -czf uploads.tar.gz -C assets/ uploads/
```
**Includes**:
- Campaign email attachments
- Response wall images
- Listmonk campaign uploads
**Size estimate**: 100MB-10GB (depends on file uploads).
---
#### 4. Backup Manifest
**Format**: JSON with file list + SHA256 checksums.
```json
{
"timestamp": "20260213_140530",
"backup_name": "changemaker-v2-backup-20260213_140530",
"files": [
{
"file": "v2-postgres.sql.gz",
"size_bytes": 123456789,
"sha256": "abc123..."
},
{
"file": "listmonk-postgres.sql.gz",
"size_bytes": 987654,
"sha256": "def456..."
},
{
"file": "uploads.tar.gz",
"size_bytes": 555666777,
"sha256": "ghi789..."
}
],
"v2_database": "changemaker_v2",
"listmonk_database": "listmonk",
"retention_days": 30
}
```
**Purpose**: Verify backup integrity + metadata.
---
### Final Archive
**Creates single tar.gz**:
```bash
tar -czf changemaker-v2-backup-20260213_140530.tar.gz \
changemaker-v2-backup-20260213_140530/
```
**Removes temp directory** after archiving.
---
### Optional S3 Upload
**Requires**:
- AWS CLI installed (`apt install awscli`)
- Credentials configured (`aws configure`)
- `S3_BUCKET` env var set
**Command**:
```bash
aws s3 cp changemaker-v2-backup-20260213_140530.tar.gz \
s3://${S3_BUCKET}/${S3_PREFIX}/
```
**S3 prefix**: `${S3_PREFIX:-changemaker-backups}` (customizable).
---
### Retention Cleanup
**Deletes backups older than `RETENTION_DAYS`**:
```bash
find backups/ -name "changemaker-v2-backup-*.tar.gz" -mtime +30 -delete
```
**Local only** (S3 has its own lifecycle policies).
---
## Restore Procedures
### Full Restore (New Server)
#### 1. Extract Backup
```bash
# Download from S3 (if needed)
aws s3 cp s3://my-bucket/changemaker-backups/changemaker-v2-backup-20260213_140530.tar.gz ./
# Extract archive
tar -xzf changemaker-v2-backup-20260213_140530.tar.gz
cd changemaker-v2-backup-20260213_140530/
```
---
#### 2. Restore V2 Database
```bash
# Start PostgreSQL container
docker compose up -d v2-postgres
# Wait for healthy
docker compose ps v2-postgres
# Restore dump
gunzip -c v2-postgres.sql.gz | \
docker exec -i changemaker-v2-postgres \
psql -U changemaker -d changemaker_v2
# Verify
docker compose exec v2-postgres \
psql -U changemaker -d changemaker_v2 -c "\dt"
```
---
#### 3. Restore Listmonk Database
```bash
# Start Listmonk DB
docker compose up -d listmonk-db
# Restore dump
gunzip -c listmonk-postgres.sql.gz | \
docker exec -i listmonk-db \
psql -U listmonk -d listmonk
# Verify
docker compose exec listmonk-db \
psql -U listmonk -d listmonk -c "SELECT COUNT(*) FROM subscribers"
```
---
#### 4. Restore Uploads
```bash
# Extract uploads
tar -xzf uploads.tar.gz -C ./assets/
# Verify
ls -lh assets/uploads/
```
---
#### 5. Start Services
```bash
# Start all services
docker compose up -d
# Run migrations (if needed)
docker compose exec api npx prisma migrate deploy
# Check health
docker compose ps
curl http://localhost:4000/api/health
```
---
### Partial Restore (Specific Data)
#### Restore Single Table
```bash
# Extract table from dump
pg_restore -U changemaker -d changemaker_v2 \
--table=campaigns \
v2-postgres.sql.gz
# Or: restore from SQL dump
gunzip -c v2-postgres.sql.gz | \
grep -A9999 "CREATE TABLE campaigns" | \
grep -B9999 "CREATE TABLE " | \
docker exec -i changemaker-v2-postgres \
psql -U changemaker -d changemaker_v2
```
---
#### Restore Specific Files
```bash
# List files in upload archive
tar -tzf uploads.tar.gz
# Extract specific file
tar -xzf uploads.tar.gz uploads/campaigns/logo.png
# Copy to container
docker cp uploads/campaigns/logo.png \
changemaker-v2-api:/app/uploads/campaigns/
```
---
## Backup Verification
### Integrity Check
```bash
# Verify checksums from manifest
cd changemaker-v2-backup-20260213_140530/
# Check v2-postgres.sql.gz
echo "abc123... v2-postgres.sql.gz" | sha256sum -c
# Check all files
jq -r '.files[] | "\(.sha256) \(.file)"' manifest.json | sha256sum -c
```
**Expected output**: `OK` for each file.
---
### Test Restore (Dry Run)
**Best practice**: Periodically test restores.
```bash
# Restore to test database
docker compose up -d v2-postgres
# Create test DB
docker compose exec v2-postgres \
psql -U changemaker -c "CREATE DATABASE changemaker_v2_test"
# Restore to test DB
gunzip -c v2-postgres.sql.gz | \
docker exec -i changemaker-v2-postgres \
psql -U changemaker -d changemaker_v2_test
# Verify data
docker compose exec v2-postgres \
psql -U changemaker -d changemaker_v2_test -c "SELECT COUNT(*) FROM users"
# Drop test DB
docker compose exec v2-postgres \
psql -U changemaker -c "DROP DATABASE changemaker_v2_test"
```
---
## S3 Configuration
### Setup AWS CLI
```bash
# Install
sudo apt install awscli
# Configure credentials
aws configure
# AWS Access Key ID: <your-key>
# AWS Secret Access Key: <your-secret>
# Default region: us-east-1
# Default output format: json
```
---
### Create S3 Bucket
```bash
# Create bucket
aws s3 mb s3://changemaker-backups
# Set lifecycle policy (auto-delete old backups)
cat > lifecycle.json <<EOF
{
"Rules": [
{
"Id": "DeleteOldBackups",
"Status": "Enabled",
"Prefix": "changemaker-backups/",
"Expiration": {
"Days": 90
}
}
]
}
EOF
aws s3api put-bucket-lifecycle-configuration \
--bucket changemaker-backups \
--lifecycle-configuration file://lifecycle.json
```
---
### Environment Variables
```bash
# Add to .env
S3_BUCKET=changemaker-backups
S3_PREFIX=changemaker-backups
AWS_ACCESS_KEY_ID=<your-key>
AWS_SECRET_ACCESS_KEY=<your-secret>
AWS_DEFAULT_REGION=us-east-1
```
---
## Retention Policies
### Recommended Strategy
**Daily backups**: Keep 7 days
**Weekly backups**: Keep 4 weeks
**Monthly backups**: Keep 12 months
**Implementation** (via cron):
```bash
# Daily (keep 7 days)
0 2 * * * /path/to/backup.sh --retention 7
# Weekly (Sundays, keep 28 days)
0 3 * * 0 /path/to/backup.sh --retention 28 --s3
# Monthly (1st of month, keep 365 days)
0 4 1 * * /path/to/backup.sh --retention 365 --s3
```
---
### S3 Lifecycle
**Glacier transition** (archive old backups):
```json
{
"Rules": [
{
"Id": "ArchiveOldBackups",
"Status": "Enabled",
"Transitions": [
{
"Days": 30,
"StorageClass": "GLACIER"
}
],
"Expiration": {
"Days": 365
}
}
]
}
```
**Apply**:
```bash
aws s3api put-bucket-lifecycle-configuration \
--bucket changemaker-backups \
--lifecycle-configuration file://lifecycle.json
```
---
## Disaster Recovery
### Complete Server Loss
**Scenario**: Server crashes, all data lost.
**Recovery Steps**:
1. **Provision new server** (same OS, Docker installed)
2. **Clone repository**:
```bash
git clone <repo> changemaker.lite
cd changemaker.lite
git checkout v2
```
3. **Restore .env file** (from secure backup location)
4. **Download latest backup** from S3:
```bash
aws s3 cp s3://changemaker-backups/changemaker-backups/latest.tar.gz ./
```
5. **Extract + restore** (see Full Restore above)
6. **Start services**:
```bash
docker compose up -d
```
7. **Verify**:
```bash
docker compose ps
curl http://localhost:4000/api/health
```
**RTO (Recovery Time Objective)**: 30-60 minutes
**RPO (Recovery Point Objective)**: Last backup (e.g., 24h for daily backups)
---
### Database Corruption
**Scenario**: PostgreSQL data corruption detected.
**Recovery**:
```bash
# Stop services
docker compose stop api admin
# Drop corrupted database
docker compose exec v2-postgres \
psql -U changemaker -c "DROP DATABASE changemaker_v2"
# Recreate database
docker compose exec v2-postgres \
psql -U changemaker -c "CREATE DATABASE changemaker_v2"
# Restore from backup
gunzip -c backups/latest/v2-postgres.sql.gz | \
docker exec -i changemaker-v2-postgres \
psql -U changemaker -d changemaker_v2
# Restart services
docker compose up -d api admin
```
---
## Monitoring Backup Success
### Log Files
**Cron output**:
```bash
# View last backup log
tail -f /var/log/changemaker-backup.log
# Check for errors
grep -i error /var/log/changemaker-backup.log
```
---
### Prometheus Metrics (Custom)
**Add to `api/src/utils/metrics.ts`**:
```typescript
export const lastBackupTimestamp = new client.Gauge({
name: 'cm_last_backup_timestamp',
help: 'Unix timestamp of last successful backup',
});
export const backupSizeBytes = new client.Gauge({
name: 'cm_backup_size_bytes',
help: 'Size of last backup in bytes',
});
```
**Alert rule**:
```yaml
- alert: BackupTooOld
expr: time() - cm_last_backup_timestamp > 86400 * 2 # 2 days
for: 1h
labels:
severity: warning
annotations:
summary: "Backup older than 2 days"
```
---
## Troubleshooting
### pg_dump: permission denied
**Symptoms**: Backup fails with "permission denied for database"
**Cause**: PostgreSQL user lacks dump privileges.
**Solution**:
```bash
# Grant privileges
docker compose exec v2-postgres \
psql -U changemaker -c "GRANT ALL ON DATABASE changemaker_v2 TO changemaker"
# Retry backup
./scripts/backup.sh
```
---
### S3 upload fails: InvalidAccessKeyId
**Symptoms**: AWS CLI authentication error
**Solution**:
```bash
# Verify credentials
aws sts get-caller-identity
# Reconfigure
aws configure
# Test S3 access
aws s3 ls s3://changemaker-backups/
```
---
### Restore fails: relation already exists
**Symptoms**: `psql: ERROR: relation "users" already exists`
**Cause**: Restoring to non-empty database.
**Solution**:
```bash
# Drop and recreate database
docker compose exec v2-postgres \
psql -U changemaker <<SQL
DROP DATABASE changemaker_v2;
CREATE DATABASE changemaker_v2;
SQL
# Retry restore
gunzip -c v2-postgres.sql.gz | \
docker exec -i changemaker-v2-postgres \
psql -U changemaker -d changemaker_v2
```
---
## Best Practices
### Security
- [ ] Encrypt backups at rest (S3 encryption enabled)
- [ ] Restrict .env file access (`chmod 600 .env`)
- [ ] Store S3 credentials securely (not in .env committed to Git)
- [ ] Test restore procedures monthly
- [ ] Document recovery procedures (this guide!)
### Automation
- [ ] Schedule daily backups via cron
- [ ] Monitor backup success (log files + metrics)
- [ ] Alert on backup failures
- [ ] Rotate local backups (retention policy)
- [ ] Offsite storage (S3 or alternative)
### Documentation
- [ ] Document .env restoration procedure
- [ ] Keep list of critical files to backup
- [ ] Document service dependencies
- [ ] Test disaster recovery plan annually
---
## Related Documentation
- **[Docker Compose](docker-compose.md)** — Service orchestration
- **[Environment Variables](environment-variables.md)** — .env restoration
- **[Monitoring Stack](monitoring-stack.md)** — Backup monitoring metrics