# Horizontal Scaling Strategies ## Overview Changemaker Lite V2 can scale horizontally to handle increased traffic and data volume. This guide covers strategies for scaling each component. **When to Scale:** - API response time >500ms (P95) - CPU usage >70% sustained - Memory usage >80% sustained - Database connection pool exhausted - Job queue backing up (>100 jobs waiting) --- ## Database Scaling ### Read Replicas **PostgreSQL streaming replication** for read-heavy workloads. **Setup** (docker-compose.yml): ```yaml v2-postgres-replica: image: postgres:16-alpine container_name: changemaker-v2-postgres-replica environment: POSTGRES_USER: replicator POSTGRES_PASSWORD: ${REPLICA_PASSWORD} command: | postgres -c wal_level=replica -c hot_standby=on -c max_wal_senders=3 -c hot_standby_feedback=on volumes: - v2-postgres-replica-data:/var/lib/postgresql/data ``` **Primary config** (postgresql.conf): ```ini wal_level = replica max_wal_senders = 3 wal_keep_size = 64MB ``` **Replication user**: ```sql CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'replica-password'; ``` **Prisma read replica** (planned feature): ```typescript // Future: Prisma read replicas const prisma = new PrismaClient({ datasources: { db: { url: process.env.DATABASE_URL, // Primary (writes) replicaUrl: process.env.REPLICA_URL, // Replica (reads) }, }, }); ``` --- ### Connection Pooling **PgBouncer** for connection pooling. **docker-compose.yml**: ```yaml pgbouncer: image: pgbouncer/pgbouncer:latest container_name: pgbouncer-changemaker environment: DATABASES_HOST: changemaker-v2-postgres DATABASES_PORT: 5432 DATABASES_USER: changemaker DATABASES_PASSWORD: ${V2_POSTGRES_PASSWORD} DATABASES_DBNAME: changemaker_v2 POOL_MODE: transaction MAX_CLIENT_CONN: 1000 DEFAULT_POOL_SIZE: 20 ports: - "6432:6432" ``` **Update DATABASE_URL**: ```bash # Before (direct) DATABASE_URL=postgresql://changemaker:pass@changemaker-v2-postgres:5432/changemaker_v2 # After (pooled) DATABASE_URL=postgresql://changemaker:pass@pgbouncer:6432/changemaker_v2 ``` **Benefits**: - Handles 1000+ client connections with only 20 PostgreSQL connections - Reduces connection overhead - Prevents "too many connections" errors --- ## API Scaling ### Multiple API Containers **docker-compose.yml**: ```yaml api: # ... existing config deploy: replicas: 3 # Run 3 API containers ``` **Or manual scaling**: ```bash docker compose up -d --scale api=3 ``` **Load balancer** (Nginx upstream): ```nginx upstream api_backend { least_conn; # Load balancing algorithm server changemaker-v2-api-1:4000; server changemaker-v2-api-2:4000; server changemaker-v2-api-3:4000; } server { location /api/ { proxy_pass http://api_backend; } } ``` **Session affinity** (sticky sessions): ```nginx upstream api_backend { ip_hash; # Route same IP to same backend server changemaker-v2-api-1:4000; server changemaker-v2-api-2:4000; } ``` --- ### Vertical Scaling (Resource Limits) **Increase container resources**: ```yaml api: deploy: resources: limits: cpus: '4' # 4 CPU cores memory: 4G # 4GB RAM reservations: cpus: '1' memory: 1G ``` **Node.js memory limit**: ```yaml api: environment: - NODE_OPTIONS=--max-old-space-size=3072 # 3GB heap ``` --- ## Redis Scaling ### Redis Cluster (Sharding) **For >100GB datasets** or high throughput. **docker-compose.yml** (6-node cluster): ```yaml redis-cluster-1: image: redis:7-alpine command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf # ... repeat for redis-cluster-2 through redis-cluster-6 ``` **Create cluster**: ```bash docker compose exec redis-cluster-1 redis-cli --cluster create \ redis-cluster-1:6379 \ redis-cluster-2:6379 \ redis-cluster-3:6379 \ redis-cluster-4:6379 \ redis-cluster-5:6379 \ redis-cluster-6:6379 \ --cluster-replicas 1 ``` --- ### Redis Sentinel (High Availability) **Automatic failover** for Redis. **docker-compose.yml**: ```yaml redis-sentinel-1: image: redis:7-alpine command: redis-sentinel /etc/redis/sentinel.conf volumes: - ./configs/redis/sentinel.conf:/etc/redis/sentinel.conf # ... repeat for sentinel-2, sentinel-3 ``` **sentinel.conf**: ```ini sentinel monitor mymaster redis-primary 6379 2 sentinel down-after-milliseconds mymaster 5000 sentinel parallel-syncs mymaster 1 sentinel failover-timeout mymaster 10000 ``` --- ## Media API Scaling ### Separate Media Containers **docker-compose.yml**: ```yaml media-api: deploy: replicas: 2 # Run 2 media API containers ``` **Nginx load balancer**: ```nginx upstream media_backend { server changemaker-media-api-1:4100; server changemaker-media-api-2:4100; } location /api/media/ { proxy_pass http://media_backend; } ``` **Shared volume** (read-only): ```yaml media-api: volumes: - ${MEDIA_ROOT}:/media:ro # All replicas read same library ``` --- ### CDN for Static Media **Cloudflare CDN** (or similar): **Setup**: 1. Enable Cloudflare proxy (orange cloud) 2. Configure cache rules: - Cache `/media/library/*.mp4` for 30 days - Bypass cache for `/api/media/` (dynamic) **Benefits**: - Offload video bandwidth - Global edge caching - DDoS protection --- ## Frontend Scaling ### CDN for Static Assets **Vite production build** → static files → CDN. **Build**: ```bash cd admin && npm run build ``` **Upload to CDN** (S3 + CloudFront): ```bash aws s3 sync dist/ s3://changemaker-static/ --delete aws cloudfront create-invalidation --distribution-id XYZ --paths "/*" ``` **Benefits**: - Global edge caching - Reduced origin load - Faster page loads --- ### Nginx Caching **Proxy cache for API responses**: ```nginx proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m max_size=1g inactive=60m; location /api/campaigns { proxy_cache api_cache; proxy_cache_valid 200 10m; proxy_cache_key "$scheme$request_method$host$request_uri"; proxy_pass http://changemaker-v2-api:4000; } ``` **Cacheable endpoints**: - `/api/campaigns` (public listing, 10 minutes) - `/api/representatives` (lookup cache, 1 hour) - `/api/locations/public` (map data, 5 minutes) **Never cache**: - POST/PUT/DELETE requests - Authenticated endpoints - Real-time data (canvass sessions) --- ## Job Queue Scaling ### Multiple BullMQ Workers **API container scaling** also scales workers (each container runs worker). **Alternative**: Dedicated worker containers. **docker-compose.yml**: ```yaml email-worker: build: context: ./api container_name: email-worker command: node dist/workers/email-worker.js environment: - REDIS_URL=${REDIS_URL} - SMTP_HOST=${SMTP_HOST} # ... other env vars depends_on: - redis ``` **Worker script** (api/src/workers/email-worker.ts): ```typescript import { emailQueue } from '../services/email-queue.service'; emailQueue.process(10, async (job) => { // Process email job }); console.log('Email worker started'); ``` **Scale workers**: ```bash docker compose up -d --scale email-worker=5 ``` --- ## Monitoring Under Load ### Load Testing **k6 script** (load-test.js): ```javascript import http from 'k6/http'; import { check } from 'k6'; export let options = { stages: [ { duration: '1m', target: 50 }, // Ramp to 50 users { duration: '3m', target: 50 }, // Stay at 50 users { duration: '1m', target: 100 }, // Ramp to 100 users { duration: '3m', target: 100 }, // Stay at 100 users { duration: '1m', target: 0 }, // Ramp down ], }; export default function () { let res = http.get('http://api.cmlite.org/api/campaigns'); check(res, { 'status 200': (r) => r.status === 200, 'response time < 500ms': (r) => r.timings.duration < 500, }); } ``` **Run test**: ```bash k6 run load-test.js ``` --- ### Prometheus Metrics **Monitor scaling indicators**: - `rate(http_requests_total[5m])` — Request rate - `histogram_quantile(0.95, http_request_duration_seconds)` — P95 latency - `container_cpu_usage_seconds_total` — CPU usage per container - `container_memory_usage_bytes` — Memory usage per container **Grafana alert**: ```yaml - alert: HighAPILatency expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5 for: 5m labels: severity: warning annotations: summary: "P95 latency >500ms, consider scaling" ``` --- ## Troubleshooting ### High CPU Usage **Diagnosis**: ```bash # Top processes docker stats # API CPU usage docker stats changemaker-v2-api # Profile Node.js docker compose exec api node --prof dist/server.js ``` **Solutions**: - Scale API containers (3-5 replicas) - Increase CPU limit (2-4 cores) - Optimize slow queries (add indexes) - Enable caching (Nginx proxy cache) --- ### Memory Leaks **Diagnosis**: ```bash # Memory usage over time docker stats --no-stream changemaker-v2-api # Heap snapshot (Node.js) docker compose exec api node --inspect dist/server.js # Chrome DevTools → Memory → Take snapshot ``` **Solutions**: - Restart containers daily (cron job) - Increase memory limit (4-8GB) - Fix code leaks (event listeners, circular refs) --- ### Database Connection Exhaustion **Symptoms**: `Error: too many connections for role "changemaker"` **Diagnosis**: ```bash # Check connection count docker compose exec v2-postgres psql -U changemaker -c \ "SELECT COUNT(*) FROM pg_stat_activity WHERE usename='changemaker'" # Check max connections docker compose exec v2-postgres psql -U changemaker -c \ "SHOW max_connections" ``` **Solutions**: - Add PgBouncer (connection pooling) - Increase `max_connections` (PostgreSQL config) - Fix connection leaks (always close Prisma clients) --- ## Cost Optimization ### Resource Allocation **Right-sizing** (don't over-provision): - Start with 1 CPU, 1GB RAM per container - Monitor actual usage (Prometheus) - Scale based on metrics (not guesses) **Example** (production workload): - API: 2 CPUs, 2GB RAM (3 replicas) - PostgreSQL: 2 CPUs, 4GB RAM - Redis: 1 CPU, 512MB RAM - Media API: 2 CPUs, 2GB RAM (2 replicas) --- ### Autoscaling (Docker Swarm) **Docker Swarm mode** (alternative to Compose): ```bash # Initialize swarm docker swarm init # Deploy stack docker stack deploy -c docker-compose.yml changemaker # Autoscale API docker service scale changemaker_api=3 # Update with zero downtime docker service update --image api:v2.1 changemaker_api ``` **Autoscaling**: ```yaml api: deploy: replicas: 3 update_config: parallelism: 1 delay: 10s restart_policy: condition: on-failure ``` --- ## Related Documentation - **[Docker Compose](docker-compose.md)** — Container orchestration - **[Monitoring Stack](monitoring-stack.md)** — Performance metrics - **[Nginx Configuration](nginx.md)** — Load balancing - **[Backup & Restore](backup-restore.md)** — Data protection at scale