# Docker Compose Orchestration ## Overview Changemaker Lite V2 uses Docker Compose to orchestrate 20+ microservices in a single unified stack. This approach simplifies deployment, provides service isolation, and ensures consistent environments across development and production. **Key Benefits:** - **Single Configuration File**: All services defined in `docker-compose.yml` - **Automatic Networking**: All containers communicate via a shared bridge network - **Health Checks**: 7 critical services have automated health monitoring - **Volume Persistence**: Database, uploads, and configuration data persisted across restarts - **Profile Support**: Optional monitoring stack behind `--profile monitoring` flag - **Container Dependencies**: Services start in correct order via `depends_on` relationships **Architecture:** The V2 stack consolidates all services into a single Docker Compose file, replacing the fragmented V1 approach. Services are organized into logical groups: Core (API, database, admin), Supporting (NocoDB, Listmonk, Gitea), Media (media-api, public-media), and Monitoring (Prometheus, Grafana, exporters). --- ## Service Architecture ```mermaid graph TB subgraph "Core Services" NGINX[nginx
:80, :443] API[api
Express :4000] MEDIA[media-api
Fastify :4100] ADMIN[admin
Vite :3000] PG[v2-postgres
PostgreSQL 16] REDIS[redis
:6379] end subgraph "Supporting Services" NOCODB[nocodb-v2
:8091] LISTMONK[listmonk-app
:9000] LISTMONK_DB[listmonk-db
PostgreSQL 17] MAILHOG[mailhog
:8025] GITEA[gitea-app
:3000] GITEA_DB[gitea-db
MySQL 8] N8N[n8n
:5678] MKDOCS[mkdocs
:8000] CODE[code-server
:8080] HOMEPAGE[homepage
:3000] MINIQR[mini-qr
:8080] end subgraph "Media Services" PUBLIC_MEDIA[public-media
:80] end subgraph "Tunnel Services" NEWT[newt
Pangolin connector] end subgraph "Monitoring Services (profile: monitoring)" PROMETHEUS[prometheus
:9090] GRAFANA[grafana
:3000] CADVISOR[cadvisor
:8080] NODE_EXPORTER[node-exporter
:9100] REDIS_EXPORTER[redis-exporter
:9121] ALERTMANAGER[alertmanager
:9093] GOTIFY[gotify
:80] end NGINX --> API NGINX --> MEDIA NGINX --> ADMIN NGINX --> NOCODB NGINX --> LISTMONK NGINX --> GITEA NGINX --> N8N NGINX --> MKDOCS NGINX --> CODE NGINX --> HOMEPAGE NGINX --> MINIQR NGINX --> MAILHOG NGINX --> PUBLIC_MEDIA API --> PG API --> REDIS MEDIA --> PG ADMIN --> API ADMIN --> MEDIA NOCODB --> PG LISTMONK --> LISTMONK_DB GITEA --> GITEA_DB NEWT --> NGINX PROMETHEUS --> API PROMETHEUS --> REDIS_EXPORTER PROMETHEUS --> CADVISOR PROMETHEUS --> NODE_EXPORTER GRAFANA --> PROMETHEUS ALERTMANAGER --> PROMETHEUS ``` --- ## Core Services ### v2-postgres **Purpose**: PostgreSQL 16 database for V2 platform (main app + NocoDB metadata) **Configuration**: ```yaml v2-postgres: image: postgres:16-alpine container_name: changemaker-v2-postgres restart: unless-stopped ports: - "127.0.0.1:5433:5432" # Localhost only environment: POSTGRES_USER: ${V2_POSTGRES_USER:-changemaker} POSTGRES_PASSWORD: ${V2_POSTGRES_PASSWORD} POSTGRES_DB: ${V2_POSTGRES_DB:-changemaker_v2} volumes: - v2-postgres-data:/var/lib/postgresql/data - ./api/prisma/init-nocodb-db.sh:/docker-entrypoint-initdb.d/init-nocodb-db.sh:ro healthcheck: test: ["CMD-SHELL", "pg_isready -U changemaker"] interval: 10s timeout: 5s retries: 5 ``` **Key Features**: - Alpine image for minimal footprint - `init-nocodb-db.sh` creates separate `nocodb_meta` database on first startup - Health check uses `pg_isready` for fast readiness detection - Port bound to `127.0.0.1` to prevent external access **Volumes**: - `v2-postgres-data`: Persistent PostgreSQL data directory **Dependencies**: None (starts first) --- ### redis **Purpose**: Shared Redis instance for sessions, BullMQ job queues, rate limiting, and geocoding cache **Configuration**: ```yaml redis: image: redis:7-alpine container_name: redis-changemaker command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru --requirepass "${REDIS_PASSWORD}" ports: - "6379:6379" volumes: - redis-data:/data healthcheck: test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"] interval: 10s timeout: 5s retries: 5 deploy: resources: limits: cpus: '1' memory: 512M reservations: cpus: '0.25' memory: 256M ``` **Key Features**: - **Authentication required**: `--requirepass` flag enforces password on all connections - **AOF persistence**: `--appendonly yes` writes every command to disk - **Memory limits**: 512MB max with LRU eviction policy - **Resource constraints**: Prevents Redis from consuming excessive host resources **Volumes**: - `redis-data`: Persistent AOF log and RDB snapshots **Security Note**: As of Security Audit 2025-02-11, Redis authentication is **REQUIRED** in production. Set a strong `REDIS_PASSWORD` in `.env`. --- ### api **Purpose**: Unified Express.js API (TypeScript, Prisma ORM) **Configuration**: ```yaml api: build: context: ./api target: development container_name: changemaker-v2-api restart: unless-stopped ports: - "${API_PORT:-4000}:4000" - "${LISTMONK_PROXY_PORT:-9002}:9002" healthcheck: test: ["CMD", "wget", "-q", "--spider", "http://localhost:4000/api/health"] interval: 15s timeout: 5s retries: 3 start_period: 30s environment: - NODE_ENV=${NODE_ENV:-development} - PORT=4000 - DATABASE_URL=postgresql://${V2_POSTGRES_USER}:${V2_POSTGRES_PASSWORD}@changemaker-v2-postgres:5432/${V2_POSTGRES_DB} - REDIS_URL=redis://:${REDIS_PASSWORD}@redis-changemaker:6379 - JWT_ACCESS_SECRET=${JWT_ACCESS_SECRET} - JWT_REFRESH_SECRET=${JWT_REFRESH_SECRET} # ... 30+ additional env vars (see .env.example) volumes: - ./api:/app - /app/node_modules - ./assets/uploads:/app/uploads - ./mkdocs:/mkdocs:rw - ./data:/data:ro - /var/run/docker.sock:/var/run/docker.sock # For Docker service management depends_on: v2-postgres: condition: service_healthy redis: condition: service_healthy ``` **Key Features**: - Waits for PostgreSQL + Redis to be healthy before starting - Mounts source code for live reloading in development - Docker socket access for managing MkDocs/Code Server containers - Health check on `/api/health` endpoint with 30s startup grace period - Exposes Listmonk proxy on port 9002 (OAuth integration) **Volumes**: - `./api:/app`: Live code reloading - `/app/node_modules`: Prevents host node_modules conflicts - `./assets/uploads:/app/uploads`: Shared upload directory - `./mkdocs:/mkdocs:rw`: MkDocs export target - `./data:/data:ro`: NAR import data (read-only) - `/var/run/docker.sock`: Docker API access **Environment Variables**: See [Environment Variables](environment-variables.md) for complete reference. --- ### media-api **Purpose**: Fastify microservice for video library management (Drizzle ORM) **Configuration**: ```yaml media-api: build: context: ./api dockerfile: Dockerfile.media target: development container_name: changemaker-media-api restart: unless-stopped ports: - "${MEDIA_API_PORT:-4100}:4100" healthcheck: test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:4100/health"] interval: 15s timeout: 5s retries: 3 start_period: 30s environment: - NODE_ENV=${NODE_ENV:-development} - MEDIA_API_PORT=4100 - DATABASE_URL=postgresql://... # Same DB as main API - ENABLE_MEDIA_FEATURES=${ENABLE_MEDIA_FEATURES:-true} - MAX_UPLOAD_SIZE_GB=${MAX_UPLOAD_SIZE_GB:-10} volumes: - ./api:/app - /app/node_modules - ${MEDIA_ROOT:-./media}:/media:ro - ${MEDIA_ROOT:-./media}/local/inbox:/media/local/inbox:rw # Upload inbox depends_on: v2-postgres: condition: service_healthy ``` **Key Features**: - Separate Dockerfile (`Dockerfile.media`) with FFmpeg/FFprobe installed - Shares PostgreSQL database with main API (different ORM) - Media library mounted read-only, inbox writable for uploads - 10GB upload size limit (configurable) **Volumes**: - `${MEDIA_ROOT}:/media:ro`: Read-only media library - `${MEDIA_ROOT}/local/inbox:/media/local/inbox:rw`: **RW mount required** for video uploads **Important**: The inbox directory **must** have `:rw` flag; main library stays `:ro` for security. --- ### admin **Purpose**: React admin GUI (Vite dev server in development, Nginx in production) **Configuration**: ```yaml admin: build: context: ./admin target: development container_name: changemaker-v2-admin restart: unless-stopped ports: - "${ADMIN_PORT:-3000}:3000" healthcheck: test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:3000/"] interval: 30s timeout: 5s retries: 3 start_period: 20s environment: - VITE_API_URL=http://changemaker-v2-api:4000 - VITE_MEDIA_API_URL=http://changemaker-media-api:4100 - VITE_MKDOCS_URL=http://mkdocs-changemaker:8000 volumes: - ./admin:/app - /app/node_modules depends_on: - api ``` **Key Features**: - Vite environment variables use **container hostnames** (not localhost) - Health check on root path (Vite dev server responds with HTML) - Live reloading via mounted source code **Environment Variables**: - `VITE_API_URL`: Points to API container (not localhost) - `VITE_MEDIA_API_URL`: Points to media-api container - `VITE_MKDOCS_URL`: Points to MkDocs container for iframe embed **Production Build**: Swap `target: development` to `target: production` and serve static files via Nginx. --- ### nginx **Purpose**: Reverse proxy with subdomain routing, SSL termination, and iframe embedding support **Configuration**: ```yaml nginx: build: context: ./nginx container_name: changemaker-v2-nginx restart: unless-stopped ports: - "${NGINX_HTTP_PORT:-80}:80" - "${NGINX_HTTPS_PORT:-443}:443" - "8881:8881" # NocoDB embed proxy - "8882:8882" # n8n embed proxy - "8883:8883" # Gitea embed proxy - "8884:8884" # MailHog embed proxy - "8885:8885" # Mini QR embed proxy healthcheck: test: ["CMD", "sh", "-c", "wget -q --spider http://127.0.0.1:80/ && pgrep crond"] interval: 30s timeout: 5s retries: 3 environment: - PANGOLIN_SITE_ID=${PANGOLIN_SITE_ID:-} volumes: - ./nginx/conf.d:/etc/nginx/conf.d:ro - ./public-web:/usr/share/nginx/public-web:ro - ./configs/pangolin:/etc/pangolin:ro depends_on: - api - admin ``` **Key Features**: - **Subdomain routing**: `api.cmlite.org`, `app.cmlite.org`, `db.cmlite.org`, etc. - **Embed proxy ports**: 888x ports strip `X-Frame-Options` for iframe embedding - **Health check**: Validates both HTTP server + cron daemon (for cert renewal) - **Read-only configs**: Prevents accidental modification **Configuration Files**: - `nginx.conf`: Global settings, gzip, security headers - `conf.d/default.conf`: Localhost fallback + path-based routing - `conf.d/api.conf`: API subdomain routing (**media endpoints must come before `/api/`**) - `conf.d/services.conf`: All supporting services + CSP headers See [Nginx Configuration](nginx.md) for complete routing details. --- ### nocodb-v2 **Purpose**: Read-only database browser for V2 schema **Configuration**: ```yaml nocodb-v2: image: nocodb/nocodb:latest container_name: changemaker-v2-nocodb restart: unless-stopped ports: - "${NOCODB_V2_PORT:-8091}:8080" healthcheck: test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/api/v1/health"] interval: 30s timeout: 5s retries: 3 start_period: 30s environment: NC_DB: "pg://changemaker-v2-postgres:5432?u=${V2_POSTGRES_USER}&p=${V2_POSTGRES_PASSWORD}&d=nocodb_meta" NC_ADMIN_EMAIL: ${NC_ADMIN_EMAIL:-admin@cmlite.org} NC_ADMIN_PASSWORD: ${NC_ADMIN_PASSWORD} NC_PUBLIC_URL: ${NC_PUBLIC_URL:-http://localhost:8091} volumes: - nocodb-v2-data:/usr/app/data depends_on: v2-postgres: condition: service_healthy ``` **Key Features**: - Uses separate `nocodb_meta` database (auto-created by `init-nocodb-db.sh`) - Health check via NocoDB API endpoint - Read-only access recommended (grant SELECT only in production) **Volumes**: - `nocodb-v2-data`: NocoDB's internal file storage **Access**: http://localhost:8091 or http://db.cmlite.org (via subdomain routing) --- ## Supporting Services ### listmonk-app **Purpose**: Email marketing platform for newsletters (V2 syncs subscribers via REST API) **Configuration**: ```yaml listmonk-app: image: listmonk/listmonk:latest container_name: listmonk-app restart: unless-stopped ports: - "${LISTMONK_PORT:-9001}:9000" healthcheck: test: ["CMD", "wget", "-q", "--spider", "http://localhost:9000/"] interval: 30s timeout: 5s retries: 3 start_period: 30s depends_on: - listmonk-db command: [sh, -c, "./listmonk --install --idempotent --yes --config '' && ./listmonk --upgrade --yes --config '' && ./listmonk --config ''"] environment: LISTMONK_app__address: 0.0.0.0:9000 LISTMONK_db__host: listmonk-db LISTMONK_db__user: ${LISTMONK_DB_USER:-listmonk} LISTMONK_db__password: ${LISTMONK_DB_PASSWORD} LISTMONK_ADMIN_USER: ${LISTMONK_WEB_ADMIN_USER:-admin} LISTMONK_ADMIN_PASSWORD: ${LISTMONK_WEB_ADMIN_PASSWORD} volumes: - ./assets/uploads:/listmonk/uploads:rw ``` **Key Features**: - **Idempotent init**: `--install --idempotent` runs migrations on every start (safe) - **Auto-upgrade**: `--upgrade --yes` applies schema upgrades - **Shared uploads**: Uses same upload directory as main API **Database**: Uses separate PostgreSQL 17 instance (`listmonk-db`) **API Integration**: V2 API syncs participants/locations to Listmonk lists via REST API (opt-in via `LISTMONK_SYNC_ENABLED=true`) --- ### listmonk-db **Purpose**: PostgreSQL 17 database for Listmonk **Configuration**: ```yaml listmonk-db: image: postgres:17-alpine container_name: listmonk-db restart: unless-stopped ports: - "127.0.0.1:5432:5432" # Localhost only environment: POSTGRES_USER: ${LISTMONK_DB_USER:-listmonk} POSTGRES_PASSWORD: ${LISTMONK_DB_PASSWORD} POSTGRES_DB: ${LISTMONK_DB_NAME:-listmonk} healthcheck: test: ["CMD-SHELL", "pg_isready -U listmonk"] interval: 10s timeout: 5s retries: 6 volumes: - listmonk-data:/var/lib/postgresql/data ``` **Key Features**: - Separate PostgreSQL instance (not shared with V2 database) - Port bound to `127.0.0.1` for security **Volumes**: - `listmonk-data`: Persistent Listmonk database --- ### listmonk-init **Purpose**: One-shot container to create Listmonk API user for V2 integration **Configuration**: ```yaml listmonk-init: image: postgres:17-alpine container_name: listmonk-init depends_on: listmonk-app: condition: service_started restart: "no" # Runs once and exits environment: PGPASSWORD: ${LISTMONK_DB_PASSWORD} LISTMONK_API_USER: ${LISTMONK_API_USER:-v2-api} LISTMONK_API_TOKEN: ${LISTMONK_API_TOKEN} entrypoint: ["/bin/sh", "-c"] command: - | # Wait for Listmonk to create tables for i in $(seq 1 30); do if psql -h listmonk-db -U listmonk -d listmonk -c "SELECT 1 FROM users LIMIT 1" >/dev/null 2>&1; then break fi sleep 2 done # Upsert API user psql -h listmonk-db -U listmonk -d listmonk -q <> .env # Restart service docker compose up -d api ``` **Common conflicts**: - Port 3000: Homepage, Grafana, admin (set `ADMIN_PORT=3005`) - Port 4000: API, MkDocs v1 (set `MKDOCS_PORT=4003`) - Port 5432: Listmonk DB, system PostgreSQL (bind to 127.0.0.1 in compose file) --- ### Volume Permission Issues **Problem**: `EACCES: permission denied` or `mkdir: cannot create directory` **Cause**: Container user mismatch with host filesystem **Solution**: ```bash # Fix ownership (run on host) sudo chown -R $USER:$USER ./api ./admin ./mkdocs ./assets # Set USER_ID/GROUP_ID in .env id -u # Get your UID id -g # Get your GID echo "USER_ID=$(id -u)" >> .env echo "GROUP_ID=$(id -g)" >> .env # Recreate containers docker compose up -d --force-recreate ``` **Services using user mapping**: - `mkdocs`: `user: "${USER_ID}:${GROUP_ID}"` - `code-server`: `user: "${USER_ID}:${GROUP_ID}"` - `homepage`: `PUID=${USER_ID}, PGID=${DOCKER_GROUP_ID}` --- ### Network Issues **Problem**: Containers can't communicate (e.g., API can't reach Redis) **Solution**: ```bash # Verify network exists docker network ls | grep changemaker-lite # Inspect network docker network inspect changemaker-lite # Check container connectivity docker compose exec api ping redis-changemaker # Recreate network docker compose down docker compose up -d ``` **DNS resolution**: Containers use Docker's internal DNS (127.0.0.11). Reference services by container name: - ✅ `redis-changemaker:6379` - ❌ `localhost:6379` (only works if port exposed to host) --- ### Database Migration Failures **Problem**: `prisma migrate deploy` fails with "relation already exists" **Solution**: ```bash # Reset database (⚠️ destroys data) docker compose exec api npx prisma migrate reset --force # Or: Fix manually docker compose exec v2-postgres psql -U changemaker -d changemaker_v2 # Check migration status docker compose exec api npx prisma migrate status # Force resolve migration docker compose exec api npx prisma migrate resolve --applied "20240101000000_init" ``` --- ### Container Crashes / Restart Loops **Problem**: Container repeatedly restarting **Diagnosis**: ```bash # Check logs for crash reason docker compose logs --tail=100 api # Check exit code docker inspect changemaker-v2-api | jq '.[0].State' # Check resource limits docker stats changemaker-v2-api ``` **Common causes**: - **Missing env vars**: Check `.env` file for required secrets - **Health check failing**: Inspect health check logs - **Out of memory**: Increase Docker memory limit or add resource constraints - **Port binding failure**: Check for port conflicts **Fix**: ```bash # Restart with fresh logs docker compose up -d --force-recreate api # Check health docker compose ps api ``` --- ### Monitoring Stack Not Starting **Problem**: Prometheus/Grafana containers missing **Cause**: Monitoring services behind `profiles: [monitoring]` **Solution**: ```bash # Start with monitoring profile docker compose --profile monitoring up -d # Or: Explicitly start monitoring services docker compose up -d prometheus grafana ``` --- ### Media Upload Failures **Problem**: Video uploads fail with `EACCES` or timeout **Diagnosis**: ```bash # Check media-api logs docker compose logs -f media-api # Verify inbox permissions ls -la ./media/local/inbox # Check disk space df -h ``` **Solution**: ```bash # Ensure inbox is writable chmod 755 ./media/local/inbox # Verify RW mount in docker-compose.yml grep "inbox:rw" docker-compose.yml # Recreate container docker compose up -d --force-recreate media-api ``` **Important**: Inbox **must** have `:rw` flag; main library stays `:ro`. --- ## Production Deployment ### Resource Limits **Production recommendations**: ```yaml # Add to services in docker-compose.yml deploy: resources: limits: cpus: '2' memory: 2G reservations: cpus: '0.5' memory: 512M ``` **Recommended limits**: - `api`: 2 CPU, 2GB RAM - `media-api`: 2 CPU, 2GB RAM (for FFprobe) - `v2-postgres`: 2 CPU, 4GB RAM - `redis`: 1 CPU, 512MB RAM (already set) - `listmonk-app`: 1 CPU, 1GB RAM - `grafana`: 1 CPU, 512MB RAM --- ### Healthcheck Tuning **Production healthcheck configuration**: ```yaml healthcheck: interval: 30s # Check every 30s (default: 15s) timeout: 10s # Allow 10s for response (default: 5s) retries: 5 # 5 failures before unhealthy (default: 3) start_period: 60s # 60s grace period on startup (default: 30s) ``` **Rationale**: - Longer intervals reduce overhead - Higher retries prevent false positives - Longer start periods for slow database migrations --- ### Log Management **Production logging configuration**: ```yaml # Add to all services logging: driver: "json-file" options: max-size: "10m" max-file: "5" ``` **Alternative**: Use centralized logging (e.g., Loki + Promtail): ```yaml logging: driver: "loki" options: loki-url: "http://loki:3100/loki/api/v1/push" ``` --- ### Restart Policies **Production restart policies**: - `restart: always` — For critical services (db, redis, api) - `restart: unless-stopped` — For most services (respects manual stops) - `restart: on-failure` — For optional services (monitoring) **Current configuration**: Most services use `unless-stopped` (allows manual shutdown). --- ### Backup Strategy **Automated backups** (via cron): ```bash # Add to crontab 0 2 * * * /home/user/changemaker.lite/scripts/backup.sh --s3 >> /var/log/changemaker-backup.log 2>&1 ``` **What gets backed up**: - V2 PostgreSQL database (pg_dump) - Listmonk PostgreSQL database (pg_dump) - Uploads directory (tar.gz) See [Backup & Restore](backup-restore.md) for complete procedures. --- ### Security Hardening **Production checklist**: - [ ] Change all default passwords in `.env` - [ ] Set strong `REDIS_PASSWORD` (required since Security Audit 2025-02-11) - [ ] Bind PostgreSQL ports to `127.0.0.1` (not `0.0.0.0`) - [ ] Enable SSL/TLS via Nginx (see [SSL/TLS](ssl-tls.md)) - [ ] Set `ENCRYPTION_KEY` (must differ from JWT secrets) - [ ] Disable `EMAIL_TEST_MODE` (use real SMTP) - [ ] Set `NODE_ENV=production` - [ ] Review Nginx security headers (CSP, HSTS, Permissions-Policy) - [ ] Restrict NocoDB to read-only access (revoke INSERT/UPDATE/DELETE) - [ ] Enable Prometheus scraping authentication (basic auth) --- ## Related Documentation - **[Environment Variables](environment-variables.md)** — Complete .env reference - **[Nginx Configuration](nginx.md)** — Reverse proxy setup + subdomain routing - **[SSL/TLS](ssl-tls.md)** — Certificate management + HTTPS setup - **[Tunneling](tunneling.md)** — Pangolin tunnel deployment - **[Monitoring Stack](monitoring-stack.md)** — Prometheus + Grafana configuration - **[Backup & Restore](backup-restore.md)** — Database backup procedures - **[Health Checks](healthchecks.md)** — Docker health check configuration - **[Scaling](scaling.md)** — Horizontal scaling strategies