11 KiB
Horizontal Scaling Strategies
Overview
Changemaker Lite V2 can scale horizontally to handle increased traffic and data volume. This guide covers strategies for scaling each component.
When to Scale:
- API response time >500ms (P95)
- CPU usage >70% sustained
- Memory usage >80% sustained
- Database connection pool exhausted
- Job queue backing up (>100 jobs waiting)
Database Scaling
Read Replicas
PostgreSQL streaming replication for read-heavy workloads.
Setup (docker-compose.yml):
v2-postgres-replica:
image: postgres:16-alpine
container_name: changemaker-v2-postgres-replica
environment:
POSTGRES_USER: replicator
POSTGRES_PASSWORD: ${REPLICA_PASSWORD}
command: |
postgres -c wal_level=replica
-c hot_standby=on
-c max_wal_senders=3
-c hot_standby_feedback=on
volumes:
- v2-postgres-replica-data:/var/lib/postgresql/data
Primary config (postgresql.conf):
wal_level = replica
max_wal_senders = 3
wal_keep_size = 64MB
Replication user:
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'replica-password';
Prisma read replica (planned feature):
// Future: Prisma read replicas
const prisma = new PrismaClient({
datasources: {
db: {
url: process.env.DATABASE_URL, // Primary (writes)
replicaUrl: process.env.REPLICA_URL, // Replica (reads)
},
},
});
Connection Pooling
PgBouncer for connection pooling.
docker-compose.yml:
pgbouncer:
image: pgbouncer/pgbouncer:latest
container_name: pgbouncer-changemaker
environment:
DATABASES_HOST: changemaker-v2-postgres
DATABASES_PORT: 5432
DATABASES_USER: changemaker
DATABASES_PASSWORD: ${V2_POSTGRES_PASSWORD}
DATABASES_DBNAME: changemaker_v2
POOL_MODE: transaction
MAX_CLIENT_CONN: 1000
DEFAULT_POOL_SIZE: 20
ports:
- "6432:6432"
Update DATABASE_URL:
# Before (direct)
DATABASE_URL=postgresql://changemaker:pass@changemaker-v2-postgres:5432/changemaker_v2
# After (pooled)
DATABASE_URL=postgresql://changemaker:pass@pgbouncer:6432/changemaker_v2
Benefits:
- Handles 1000+ client connections with only 20 PostgreSQL connections
- Reduces connection overhead
- Prevents "too many connections" errors
API Scaling
Multiple API Containers
docker-compose.yml:
api:
# ... existing config
deploy:
replicas: 3 # Run 3 API containers
Or manual scaling:
docker compose up -d --scale api=3
Load balancer (Nginx upstream):
upstream api_backend {
least_conn; # Load balancing algorithm
server changemaker-v2-api-1:4000;
server changemaker-v2-api-2:4000;
server changemaker-v2-api-3:4000;
}
server {
location /api/ {
proxy_pass http://api_backend;
}
}
Session affinity (sticky sessions):
upstream api_backend {
ip_hash; # Route same IP to same backend
server changemaker-v2-api-1:4000;
server changemaker-v2-api-2:4000;
}
Vertical Scaling (Resource Limits)
Increase container resources:
api:
deploy:
resources:
limits:
cpus: '4' # 4 CPU cores
memory: 4G # 4GB RAM
reservations:
cpus: '1'
memory: 1G
Node.js memory limit:
api:
environment:
- NODE_OPTIONS=--max-old-space-size=3072 # 3GB heap
Redis Scaling
Redis Cluster (Sharding)
For >100GB datasets or high throughput.
docker-compose.yml (6-node cluster):
redis-cluster-1:
image: redis:7-alpine
command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf
# ... repeat for redis-cluster-2 through redis-cluster-6
Create cluster:
docker compose exec redis-cluster-1 redis-cli --cluster create \
redis-cluster-1:6379 \
redis-cluster-2:6379 \
redis-cluster-3:6379 \
redis-cluster-4:6379 \
redis-cluster-5:6379 \
redis-cluster-6:6379 \
--cluster-replicas 1
Redis Sentinel (High Availability)
Automatic failover for Redis.
docker-compose.yml:
redis-sentinel-1:
image: redis:7-alpine
command: redis-sentinel /etc/redis/sentinel.conf
volumes:
- ./configs/redis/sentinel.conf:/etc/redis/sentinel.conf
# ... repeat for sentinel-2, sentinel-3
sentinel.conf:
sentinel monitor mymaster redis-primary 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 10000
Media API Scaling
Separate Media Containers
docker-compose.yml:
media-api:
deploy:
replicas: 2 # Run 2 media API containers
Nginx load balancer:
upstream media_backend {
server changemaker-media-api-1:4100;
server changemaker-media-api-2:4100;
}
location /api/media/ {
proxy_pass http://media_backend;
}
Shared volume (read-only):
media-api:
volumes:
- ${MEDIA_ROOT}:/media:ro # All replicas read same library
CDN for Static Media
Cloudflare CDN (or similar):
Setup:
- Enable Cloudflare proxy (orange cloud)
- Configure cache rules:
- Cache
/media/library/*.mp4for 30 days - Bypass cache for
/api/media/(dynamic)
- Cache
Benefits:
- Offload video bandwidth
- Global edge caching
- DDoS protection
Frontend Scaling
CDN for Static Assets
Vite production build → static files → CDN.
Build:
cd admin && npm run build
Upload to CDN (S3 + CloudFront):
aws s3 sync dist/ s3://changemaker-static/ --delete
aws cloudfront create-invalidation --distribution-id XYZ --paths "/*"
Benefits:
- Global edge caching
- Reduced origin load
- Faster page loads
Nginx Caching
Proxy cache for API responses:
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m max_size=1g inactive=60m;
location /api/campaigns {
proxy_cache api_cache;
proxy_cache_valid 200 10m;
proxy_cache_key "$scheme$request_method$host$request_uri";
proxy_pass http://changemaker-v2-api:4000;
}
Cacheable endpoints:
/api/campaigns(public listing, 10 minutes)/api/representatives(lookup cache, 1 hour)/api/locations/public(map data, 5 minutes)
Never cache:
- POST/PUT/DELETE requests
- Authenticated endpoints
- Real-time data (canvass sessions)
Job Queue Scaling
Multiple BullMQ Workers
API container scaling also scales workers (each container runs worker).
Alternative: Dedicated worker containers.
docker-compose.yml:
email-worker:
build:
context: ./api
container_name: email-worker
command: node dist/workers/email-worker.js
environment:
- REDIS_URL=${REDIS_URL}
- SMTP_HOST=${SMTP_HOST}
# ... other env vars
depends_on:
- redis
Worker script (api/src/workers/email-worker.ts):
import { emailQueue } from '../services/email-queue.service';
emailQueue.process(10, async (job) => {
// Process email job
});
console.log('Email worker started');
Scale workers:
docker compose up -d --scale email-worker=5
Monitoring Under Load
Load Testing
k6 script (load-test.js):
import http from 'k6/http';
import { check } from 'k6';
export let options = {
stages: [
{ duration: '1m', target: 50 }, // Ramp to 50 users
{ duration: '3m', target: 50 }, // Stay at 50 users
{ duration: '1m', target: 100 }, // Ramp to 100 users
{ duration: '3m', target: 100 }, // Stay at 100 users
{ duration: '1m', target: 0 }, // Ramp down
],
};
export default function () {
let res = http.get('http://api.cmlite.org/api/campaigns');
check(res, {
'status 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
});
}
Run test:
k6 run load-test.js
Prometheus Metrics
Monitor scaling indicators:
rate(http_requests_total[5m])— Request ratehistogram_quantile(0.95, http_request_duration_seconds)— P95 latencycontainer_cpu_usage_seconds_total— CPU usage per containercontainer_memory_usage_bytes— Memory usage per container
Grafana alert:
- alert: HighAPILatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "P95 latency >500ms, consider scaling"
Troubleshooting
High CPU Usage
Diagnosis:
# Top processes
docker stats
# API CPU usage
docker stats changemaker-v2-api
# Profile Node.js
docker compose exec api node --prof dist/server.js
Solutions:
- Scale API containers (3-5 replicas)
- Increase CPU limit (2-4 cores)
- Optimize slow queries (add indexes)
- Enable caching (Nginx proxy cache)
Memory Leaks
Diagnosis:
# Memory usage over time
docker stats --no-stream changemaker-v2-api
# Heap snapshot (Node.js)
docker compose exec api node --inspect dist/server.js
# Chrome DevTools → Memory → Take snapshot
Solutions:
- Restart containers daily (cron job)
- Increase memory limit (4-8GB)
- Fix code leaks (event listeners, circular refs)
Database Connection Exhaustion
Symptoms: Error: too many connections for role "changemaker"
Diagnosis:
# Check connection count
docker compose exec v2-postgres psql -U changemaker -c \
"SELECT COUNT(*) FROM pg_stat_activity WHERE usename='changemaker'"
# Check max connections
docker compose exec v2-postgres psql -U changemaker -c \
"SHOW max_connections"
Solutions:
- Add PgBouncer (connection pooling)
- Increase
max_connections(PostgreSQL config) - Fix connection leaks (always close Prisma clients)
Cost Optimization
Resource Allocation
Right-sizing (don't over-provision):
- Start with 1 CPU, 1GB RAM per container
- Monitor actual usage (Prometheus)
- Scale based on metrics (not guesses)
Example (production workload):
- API: 2 CPUs, 2GB RAM (3 replicas)
- PostgreSQL: 2 CPUs, 4GB RAM
- Redis: 1 CPU, 512MB RAM
- Media API: 2 CPUs, 2GB RAM (2 replicas)
Autoscaling (Docker Swarm)
Docker Swarm mode (alternative to Compose):
# Initialize swarm
docker swarm init
# Deploy stack
docker stack deploy -c docker-compose.yml changemaker
# Autoscale API
docker service scale changemaker_api=3
# Update with zero downtime
docker service update --image api:v2.1 changemaker_api
Autoscaling:
api:
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
Related Documentation
- Docker Compose — Container orchestration
- Monitoring Stack — Performance metrics
- Nginx Configuration — Load balancing
- Backup & Restore — Data protection at scale