12 KiB
Docker Health Check Configuration
Overview
Docker health checks provide automatic service monitoring and restart capabilities. Changemaker Lite V2 includes health checks for 7 critical services.
Benefits:
- Automatic restart of unhealthy containers
- Dependency management (
depends_onwithservice_healthy) - Monitoring integration (Prometheus can scrape health status)
Services with Health Checks
| Service | Healthcheck Command | Interval | Timeout | Retries | Start Period |
|---|---|---|---|---|---|
| api | wget http://localhost:4000/api/health |
15s | 5s | 3 | 30s |
| media-api | wget http://127.0.0.1:4100/health |
15s | 5s | 3 | 30s |
| admin | wget http://127.0.0.1:3000/ |
30s | 5s | 3 | 20s |
| v2-postgres | pg_isready -U changemaker |
10s | 5s | 5 | - |
| redis | redis-cli -a $REDIS_PASSWORD ping |
10s | 5s | 5 | - |
| gitea-app | curl http://localhost:3000/ |
30s | 5s | 3 | 30s |
| n8n | wget http://localhost:5678/healthz |
30s | 5s | 3 | 30s |
Health Check Configuration
API (Express)
docker-compose.yml:
api:
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:4000/api/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s
Explanation:
- test: Runs
wget(Alpine image standard) to check/api/healthendpoint - interval: Check every 15 seconds
- timeout: Fail if no response in 5 seconds
- retries: Mark unhealthy after 3 consecutive failures
- start_period: 30s grace period on startup (allows migrations to run)
Health endpoint (api/src/server.ts):
app.get('/api/health', (req, res) => {
res.json({ status: 'ok', timestamp: new Date().toISOString() });
});
Health states:
- starting: Within start_period (30s)
- healthy: Check passed
- unhealthy: 3 consecutive failures
Media API (Fastify)
docker-compose.yml:
media-api:
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:4100/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s
Health endpoint (api/src/media-server.ts):
app.get('/health', async (req, reply) => {
return { status: 'ok' };
});
Note: Uses 127.0.0.1 instead of localhost (Alpine's wget prefers IP).
Admin (Vite Dev Server)
docker-compose.yml:
admin:
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:3000/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 20s
Explanation:
- 30s interval: Less critical than backend (frontend can tolerate brief downtime)
- 20s start period: Vite dev server starts quickly
- Root path: Checks Vite is serving HTML (no dedicated /health endpoint)
V2 PostgreSQL
docker-compose.yml:
v2-postgres:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U changemaker"]
interval: 10s
timeout: 5s
retries: 5
Explanation:
- pg_isready: Built-in PostgreSQL health check utility
- 10s interval: Fast detection of database issues
- 5 retries: More tolerant (database startup can be slow)
- No start_period: PostgreSQL has its own startup delay
pg_isready output:
# Healthy
/var/run/postgresql:5432 - accepting connections
# Unhealthy
/var/run/postgresql:5432 - rejecting connections
Redis
docker-compose.yml:
redis:
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 5s
retries: 5
Explanation:
- redis-cli ping: Returns
PONGif healthy - -a ${REDIS_PASSWORD}: Authenticates with password (required since Security Audit)
- 10s interval: Fast detection for critical cache service
PING output:
# Healthy
PONG
# Unhealthy
(error) NOAUTH Authentication required
Gitea
docker-compose.yml:
gitea-app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
Explanation:
- curl: Debian-based image (no
wget) - -f: Fail on HTTP errors (non-200 response)
- 30s interval: Supporting service (less critical)
Important: Gitea uses curl (not wget) because it's a Debian image, not Alpine.
n8n
docker-compose.yml:
n8n:
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:5678/healthz"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
Explanation:
- /healthz: n8n's built-in health endpoint
- 30s interval: Workflow automation (not user-facing)
Dependency Chains
API Depends on Database + Redis
docker-compose.yml:
api:
depends_on:
v2-postgres:
condition: service_healthy
redis:
condition: service_healthy
Effect: API container waits for PostgreSQL + Redis to be healthy before starting.
Startup sequence:
- PostgreSQL starts → health checks begin
- After 5 successful checks → marked healthy
- Redis starts → health checks begin
- After 5 successful checks → marked healthy
- API starts (both dependencies healthy)
Media API Depends on Database
docker-compose.yml:
media-api:
depends_on:
v2-postgres:
condition: service_healthy
Effect: Media API waits for PostgreSQL to be healthy.
NocoDB Depends on Database
docker-compose.yml:
nocodb-v2:
depends_on:
v2-postgres:
condition: service_healthy
Effect: NocoDB waits for its metadata database to be ready.
Monitoring Healthcheck Status
View Health Status
# All services (shows health in STATUS column)
docker compose ps
# Example output:
# NAME STATUS
# changemaker-v2-api Up 2 hours (healthy)
# changemaker-v2-postgres Up 2 hours (healthy)
# redis-changemaker Up 2 hours (healthy)
Health states:
(healthy): All checks passing(unhealthy): Multiple checks failed(health: starting): Within start_period
Filter Unhealthy Services
# Show only unhealthy
docker compose ps | grep unhealthy
# Count unhealthy
docker compose ps -q --status unhealthy | wc -l
Inspect Health Check Details
# Full health info for API
docker inspect changemaker-v2-api | jq '.[0].State.Health'
# Example output:
{
"Status": "healthy",
"FailingStreak": 0,
"Log": [
{
"Start": "2026-02-13T14:30:00Z",
"End": "2026-02-13T14:30:01Z",
"ExitCode": 0,
"Output": ""
}
]
}
Key fields:
- Status:
healthy,unhealthy, orstarting - FailingStreak: Consecutive failed checks
- Log: Last 5 health check results
Health Check Logs
# View health check output
docker inspect changemaker-v2-api | jq '.[0].State.Health.Log[-1]'
# Example (success):
{
"Start": "2026-02-13T14:30:00Z",
"End": "2026-02-13T14:30:01Z",
"ExitCode": 0,
"Output": ""
}
# Example (failure):
{
"Start": "2026-02-13T14:35:00Z",
"End": "2026-02-13T14:35:05Z",
"ExitCode": 1,
"Output": "wget: can't connect to remote host (127.0.0.1): Connection refused"
}
Custom Health Checks
Advanced API Health Check
Check database + Redis connectivity:
api/src/server.ts:
app.get('/api/health', async (req, res) => {
const checks = {
database: false,
redis: false,
};
try {
await prisma.$queryRaw`SELECT 1`;
checks.database = true;
} catch (err) {
console.error('DB health check failed:', err);
}
try {
await redis.ping();
checks.redis = true;
} catch (err) {
console.error('Redis health check failed:', err);
}
const healthy = checks.database && checks.redis;
res.status(healthy ? 200 : 503).json({
status: healthy ? 'ok' : 'degraded',
checks,
timestamp: new Date().toISOString(),
});
});
docker-compose.yml (no change needed — still checks /api/health):
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:4000/api/health"]
Readiness vs Liveness
Readiness: Service is ready to accept traffic (used by Kubernetes)
Liveness: Service is running (Docker health checks)
Example (separate endpoints):
// Liveness (minimal check)
app.get('/api/health', (req, res) => {
res.json({ status: 'ok' });
});
// Readiness (comprehensive check)
app.get('/api/ready', async (req, res) => {
const dbReady = await checkDatabase();
const redisReady = await checkRedis();
const ready = dbReady && redisReady;
res.status(ready ? 200 : 503).json({ ready, dbReady, redisReady });
});
Docker uses liveness (/api/health).
Load balancer uses readiness (/api/ready).
Troubleshooting
Service Marked Unhealthy
Diagnosis:
# Check logs
docker compose logs --tail=50 api
# Check health check output
docker inspect changemaker-v2-api | jq '.[0].State.Health.Log[-1].Output'
# Manually run health check
docker compose exec api wget -O- http://localhost:4000/api/health
Common causes:
- Service crashed (check logs)
- Health endpoint broken (test manually)
- Timeout too short (increase in docker-compose.yml)
- Database migration running (increase start_period)
Container Restarting Loop
Symptoms: Container repeatedly marked unhealthy → restart → unhealthy
Diagnosis:
# Check restart count
docker inspect changemaker-v2-api | jq '.[0].RestartCount'
# Check logs for errors
docker compose logs api | grep -i error
Common causes:
- Health check too aggressive (increase retries/interval)
- Service genuinely broken (fix code issue)
- Resource limits too low (increase memory/CPU)
Solution:
# Temporarily disable health check
healthcheck:
disable: true
# Or increase tolerance
healthcheck:
retries: 10
start_period: 60s
Health Check Command Not Found
Symptoms: Health check fails with "wget: not found" or "curl: not found"
Cause: Using wrong command for image type (Alpine vs Debian)
Solution:
Alpine images (api, media-api, redis, v2-postgres):
test: ["CMD", "wget", "-q", "--spider", "http://..."]
Debian images (gitea-app):
test: ["CMD", "curl", "-f", "http://..."]
Start Period Too Short
Symptoms: Service marked unhealthy immediately on startup
Cause: Database migrations or slow startup exceed start_period
Solution:
# Increase start_period
healthcheck:
start_period: 60s # Was 30s
Monitor startup time:
# Measure time to first healthy
docker compose up -d api && \
while ! docker compose ps api | grep -q healthy; do sleep 1; done && \
echo "Startup took $SECONDS seconds"
Production Recommendations
Timeout Configuration
Critical services (database, redis, api):
- interval: 10-15s
- timeout: 5s
- retries: 3-5
- start_period: 30-60s
Supporting services (n8n, gitea, mailhog):
- interval: 30-60s
- timeout: 10s
- retries: 3
- start_period: 30s
Restart Policies
Combine with restart policies:
api:
restart: unless-stopped # Auto-restart on failure
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:4000/api/health"]
Effect: Unhealthy container → restart → health checks resume.
Monitoring Integration
Prometheus exporter (future):
# Expose health check status as metrics
docker_healthcheck_status{container="changemaker-v2-api"} 1
Alert on unhealthy:
- alert: ContainerUnhealthy
expr: docker_healthcheck_status == 0
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} unhealthy"
Testing Health Checks
Manual Test
# Start service
docker compose up -d api
# Watch health status
watch -n2 'docker compose ps api'
# Should see:
# (health: starting) → (healthy)
Simulate Failure
# Stop backend service
docker compose stop v2-postgres
# Wait 15s (API health check interval)
sleep 15
# Check API status
docker compose ps api
# Should show (unhealthy) after 3 failures (45s)
# Restart backend
docker compose start v2-postgres
# API should recover
docker compose ps api
# Should show (healthy) after successful check
Related Documentation
- Docker Compose — Service orchestration
- Monitoring Stack — Health metrics
- Troubleshooting — Debug failing services