11 KiB

Horizontal Scaling Strategies

Overview

Changemaker Lite V2 can scale horizontally to handle increased traffic and data volume. This guide covers strategies for scaling each component.

When to Scale:

  • API response time >500ms (P95)
  • CPU usage >70% sustained
  • Memory usage >80% sustained
  • Database connection pool exhausted
  • Job queue backing up (>100 jobs waiting)

Database Scaling

Read Replicas

PostgreSQL streaming replication for read-heavy workloads.

Setup (docker-compose.yml):

v2-postgres-replica:
  image: postgres:16-alpine
  container_name: changemaker-v2-postgres-replica
  environment:
    POSTGRES_USER: replicator
    POSTGRES_PASSWORD: ${REPLICA_PASSWORD}
  command: |
    postgres -c wal_level=replica
             -c hot_standby=on
             -c max_wal_senders=3
             -c hot_standby_feedback=on
  volumes:
    - v2-postgres-replica-data:/var/lib/postgresql/data

Primary config (postgresql.conf):

wal_level = replica
max_wal_senders = 3
wal_keep_size = 64MB

Replication user:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'replica-password';

Prisma read replica (planned feature):

// Future: Prisma read replicas
const prisma = new PrismaClient({
  datasources: {
    db: {
      url: process.env.DATABASE_URL,           // Primary (writes)
      replicaUrl: process.env.REPLICA_URL,     // Replica (reads)
    },
  },
});

Connection Pooling

PgBouncer for connection pooling.

docker-compose.yml:

pgbouncer:
  image: pgbouncer/pgbouncer:latest
  container_name: pgbouncer-changemaker
  environment:
    DATABASES_HOST: changemaker-v2-postgres
    DATABASES_PORT: 5432
    DATABASES_USER: changemaker
    DATABASES_PASSWORD: ${V2_POSTGRES_PASSWORD}
    DATABASES_DBNAME: changemaker_v2
    POOL_MODE: transaction
    MAX_CLIENT_CONN: 1000
    DEFAULT_POOL_SIZE: 20
  ports:
    - "6432:6432"

Update DATABASE_URL:

# Before (direct)
DATABASE_URL=postgresql://changemaker:pass@changemaker-v2-postgres:5432/changemaker_v2

# After (pooled)
DATABASE_URL=postgresql://changemaker:pass@pgbouncer:6432/changemaker_v2

Benefits:

  • Handles 1000+ client connections with only 20 PostgreSQL connections
  • Reduces connection overhead
  • Prevents "too many connections" errors

API Scaling

Multiple API Containers

docker-compose.yml:

api:
  # ... existing config
  deploy:
    replicas: 3  # Run 3 API containers

Or manual scaling:

docker compose up -d --scale api=3

Load balancer (Nginx upstream):

upstream api_backend {
    least_conn;  # Load balancing algorithm
    server changemaker-v2-api-1:4000;
    server changemaker-v2-api-2:4000;
    server changemaker-v2-api-3:4000;
}

server {
    location /api/ {
        proxy_pass http://api_backend;
    }
}

Session affinity (sticky sessions):

upstream api_backend {
    ip_hash;  # Route same IP to same backend
    server changemaker-v2-api-1:4000;
    server changemaker-v2-api-2:4000;
}

Vertical Scaling (Resource Limits)

Increase container resources:

api:
  deploy:
    resources:
      limits:
        cpus: '4'      # 4 CPU cores
        memory: 4G     # 4GB RAM
      reservations:
        cpus: '1'
        memory: 1G

Node.js memory limit:

api:
  environment:
    - NODE_OPTIONS=--max-old-space-size=3072  # 3GB heap

Redis Scaling

Redis Cluster (Sharding)

For >100GB datasets or high throughput.

docker-compose.yml (6-node cluster):

redis-cluster-1:
  image: redis:7-alpine
  command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf

# ... repeat for redis-cluster-2 through redis-cluster-6

Create cluster:

docker compose exec redis-cluster-1 redis-cli --cluster create \
  redis-cluster-1:6379 \
  redis-cluster-2:6379 \
  redis-cluster-3:6379 \
  redis-cluster-4:6379 \
  redis-cluster-5:6379 \
  redis-cluster-6:6379 \
  --cluster-replicas 1

Redis Sentinel (High Availability)

Automatic failover for Redis.

docker-compose.yml:

redis-sentinel-1:
  image: redis:7-alpine
  command: redis-sentinel /etc/redis/sentinel.conf
  volumes:
    - ./configs/redis/sentinel.conf:/etc/redis/sentinel.conf

# ... repeat for sentinel-2, sentinel-3

sentinel.conf:

sentinel monitor mymaster redis-primary 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 10000

Media API Scaling

Separate Media Containers

docker-compose.yml:

media-api:
  deploy:
    replicas: 2  # Run 2 media API containers

Nginx load balancer:

upstream media_backend {
    server changemaker-media-api-1:4100;
    server changemaker-media-api-2:4100;
}

location /api/media/ {
    proxy_pass http://media_backend;
}

Shared volume (read-only):

media-api:
  volumes:
    - ${MEDIA_ROOT}:/media:ro  # All replicas read same library

CDN for Static Media

Cloudflare CDN (or similar):

Setup:

  1. Enable Cloudflare proxy (orange cloud)
  2. Configure cache rules:
    • Cache /media/library/*.mp4 for 30 days
    • Bypass cache for /api/media/ (dynamic)

Benefits:

  • Offload video bandwidth
  • Global edge caching
  • DDoS protection

Frontend Scaling

CDN for Static Assets

Vite production build → static files → CDN.

Build:

cd admin && npm run build

Upload to CDN (S3 + CloudFront):

aws s3 sync dist/ s3://changemaker-static/ --delete
aws cloudfront create-invalidation --distribution-id XYZ --paths "/*"

Benefits:

  • Global edge caching
  • Reduced origin load
  • Faster page loads

Nginx Caching

Proxy cache for API responses:

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m max_size=1g inactive=60m;

location /api/campaigns {
    proxy_cache api_cache;
    proxy_cache_valid 200 10m;
    proxy_cache_key "$scheme$request_method$host$request_uri";
    proxy_pass http://changemaker-v2-api:4000;
}

Cacheable endpoints:

  • /api/campaigns (public listing, 10 minutes)
  • /api/representatives (lookup cache, 1 hour)
  • /api/locations/public (map data, 5 minutes)

Never cache:

  • POST/PUT/DELETE requests
  • Authenticated endpoints
  • Real-time data (canvass sessions)

Job Queue Scaling

Multiple BullMQ Workers

API container scaling also scales workers (each container runs worker).

Alternative: Dedicated worker containers.

docker-compose.yml:

email-worker:
  build:
    context: ./api
  container_name: email-worker
  command: node dist/workers/email-worker.js
  environment:
    - REDIS_URL=${REDIS_URL}
    - SMTP_HOST=${SMTP_HOST}
    # ... other env vars
  depends_on:
    - redis

Worker script (api/src/workers/email-worker.ts):

import { emailQueue } from '../services/email-queue.service';

emailQueue.process(10, async (job) => {
  // Process email job
});

console.log('Email worker started');

Scale workers:

docker compose up -d --scale email-worker=5

Monitoring Under Load

Load Testing

k6 script (load-test.js):

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  stages: [
    { duration: '1m', target: 50 },   // Ramp to 50 users
    { duration: '3m', target: 50 },   // Stay at 50 users
    { duration: '1m', target: 100 },  // Ramp to 100 users
    { duration: '3m', target: 100 },  // Stay at 100 users
    { duration: '1m', target: 0 },    // Ramp down
  ],
};

export default function () {
  let res = http.get('http://api.cmlite.org/api/campaigns');
  check(res, {
    'status 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
}

Run test:

k6 run load-test.js

Prometheus Metrics

Monitor scaling indicators:

  • rate(http_requests_total[5m]) — Request rate
  • histogram_quantile(0.95, http_request_duration_seconds) — P95 latency
  • container_cpu_usage_seconds_total — CPU usage per container
  • container_memory_usage_bytes — Memory usage per container

Grafana alert:

- alert: HighAPILatency
  expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "P95 latency >500ms, consider scaling"

Troubleshooting

High CPU Usage

Diagnosis:

# Top processes
docker stats

# API CPU usage
docker stats changemaker-v2-api

# Profile Node.js
docker compose exec api node --prof dist/server.js

Solutions:

  • Scale API containers (3-5 replicas)
  • Increase CPU limit (2-4 cores)
  • Optimize slow queries (add indexes)
  • Enable caching (Nginx proxy cache)

Memory Leaks

Diagnosis:

# Memory usage over time
docker stats --no-stream changemaker-v2-api

# Heap snapshot (Node.js)
docker compose exec api node --inspect dist/server.js
# Chrome DevTools → Memory → Take snapshot

Solutions:

  • Restart containers daily (cron job)
  • Increase memory limit (4-8GB)
  • Fix code leaks (event listeners, circular refs)

Database Connection Exhaustion

Symptoms: Error: too many connections for role "changemaker"

Diagnosis:

# Check connection count
docker compose exec v2-postgres psql -U changemaker -c \
  "SELECT COUNT(*) FROM pg_stat_activity WHERE usename='changemaker'"

# Check max connections
docker compose exec v2-postgres psql -U changemaker -c \
  "SHOW max_connections"

Solutions:

  • Add PgBouncer (connection pooling)
  • Increase max_connections (PostgreSQL config)
  • Fix connection leaks (always close Prisma clients)

Cost Optimization

Resource Allocation

Right-sizing (don't over-provision):

  • Start with 1 CPU, 1GB RAM per container
  • Monitor actual usage (Prometheus)
  • Scale based on metrics (not guesses)

Example (production workload):

  • API: 2 CPUs, 2GB RAM (3 replicas)
  • PostgreSQL: 2 CPUs, 4GB RAM
  • Redis: 1 CPU, 512MB RAM
  • Media API: 2 CPUs, 2GB RAM (2 replicas)

Autoscaling (Docker Swarm)

Docker Swarm mode (alternative to Compose):

# Initialize swarm
docker swarm init

# Deploy stack
docker stack deploy -c docker-compose.yml changemaker

# Autoscale API
docker service scale changemaker_api=3

# Update with zero downtime
docker service update --image api:v2.1 changemaker_api

Autoscaling:

api:
  deploy:
    replicas: 3
    update_config:
      parallelism: 1
      delay: 10s
    restart_policy:
      condition: on-failure