Compare commits

...

4 Commits

Author SHA1 Message Date
e88ac79ae8 fix(ccp-agent): export COMPOSE_PROJECT_NAME so upgrade.sh sees correct project
The agent already passed COMPOSE_PROJECT in env, but Docker Compose actually
reads COMPOSE_PROJECT_NAME. When upgrade.sh (running inside the agent
container at cwd=/app/instance) shelled out to `docker compose up -d` in
Phase 5, compose defaulted the project name to "instance" (cwd basename),
collided with the host's existing containers under "changemakerlite", and
the upgrade aborted with "Container ... already in use by container ..."
errors.

Discovered when triggering the first end-to-end CCP "Upgrade Now" on
marcelle (v2.9.15 → v2.10.1). Backup/code/rebuild phases all succeeded;
migration phase failed instantly. Rollback restored marcelle cleanly.

This commit adds COMPOSE_PROJECT_NAME alongside the existing COMPOSE_PROJECT
(which the agent's TypeScript still reads for its own slug derivation).

Bunker Admin
2026-05-20 15:57:30 -06:00
1b80e8294c fix(ccp-agent): whitelist /app/instance for git safe.directory
The agent container runs as root but the bind-mounted instance directory
is owned by the host user (UID 1000 = `node` in the container). Modern
git refuses to operate on such repos without an explicit safe.directory
entry, breaking upgrade-check.sh's `git fetch/log` calls on source-installed
tenants. Verified empirically on soroush after the previous fix landed.

Bunker Admin
2026-05-20 12:14:39 -06:00
a531f9b9ce fix(ccp): make agent functional + fix Gitea release timestamp bug
Three related fixes uncovered during a marcelle CCP registration test:

1. ccp-agent image was missing bash + curl + jq + python3, so every
   spawn('bash', ...) in upgrade.routes.ts and backup.routes.ts failed
   silently with ENOENT. CCP kept reading stale status.json files from
   disk, masking that no agent had successfully checked for updates in
   weeks. apk-add the missing tools.

2. ccp-agent's /app/instance mount was :ro, blocking the agent from
   writing data/upgrade/status.json (and result/progress/backups).
   Agent already has docker.sock — removing :ro is not a security
   escalation. Patched both docker-compose.yml and docker-compose.prod.yml.

3. Gitea 1.23.x only initializes Release.CreatedUnix inside its
   createTag() helper, which is skipped if the tag already exists on
   origin. The old DEV_WORKFLOW pattern (push tag, then run
   build-release.sh --upload) was triggering this — releases got
   created_unix=0 and lost /releases/latest sort order to v2.9.14.
   build-release.sh now removes the remote tag first and POSTs with
   target_commitish so Gitea creates the tag and release atomically.

After these fixes, CCP's "Check for Updates" path returns truthful
data end-to-end (verified on marcelle: v2.9.15 -> v2.10.1, 1 behind).

Bunker Admin
2026-05-20 11:59:35 -06:00
a82e95946b fix(gancio): pre-start config-init sidecar prevents restart loop
Gancio refuses to start when its DB has tables but the data volume has no
config.json ("Non empty db! Please move your current db elsewhere than retry"),
which produces an infinite restart loop. This hit production tenants bnkops
and trbh (>1200 restart cycles each) — proximate cause was a missing
config.json in changemakerlite_gancio-data with the DB fully populated.

Add gancio-config-init alpine sidecar that runs on every `up`:
  - no-op when config.json exists
  - regenerates from .env when missing (1000:1000 ownership)
  - gancio service now depends on its service_completed_successfully

Also harden verify_gancio_config in upgrade.sh to error loudly when
multiple gancio-data volumes match (silent head -1 could pick the wrong
one after a compose project rename).
2026-05-19 17:02:55 -06:00
5 changed files with 125 additions and 9 deletions

View File

@ -8,7 +8,16 @@ COPY src/ ./src/
RUN npx tsc
FROM node:20-alpine
RUN apk add --no-cache docker-cli docker-cli-compose git rsync
# bash + curl + jq + python3 are required by the changemaker scripts the agent
# shells out to (upgrade-check.sh, upgrade.sh, backup.sh). Without them, every
# /upgrade/* and /backup/* call returns "command not found" failures.
RUN apk add --no-cache docker-cli docker-cli-compose git rsync bash curl jq python3
# Agent runs as root, but the bind-mounted /app/instance is owned by the host
# user (UID 1000 = `node` inside the container). Modern git refuses to operate
# on repos with mismatched ownership without an explicit safe.directory entry.
# Wildcard whitelist all paths — the agent only mounts a single host directory
# anyway (the instance's project root).
RUN git config --system --add safe.directory '*'
WORKDIR /app
COPY package*.json ./
RUN npm ci --production

View File

@ -976,6 +976,39 @@ services:
retries: 10
start_period: 30s
# Gancio Config Init — Writes /home/node/data/config.json from .env if missing.
# Gancio refuses to start when its DB has tables but the data volume has no
# config.json ("Non empty db! Please move your current db elsewhere than retry"),
# which causes an infinite restart loop. This sidecar runs on every `up` and is
# a no-op when config.json is already present. See docker-compose.yml for the
# full rationale; the two files must stay in parity per scripts/validate-compose-parity.sh.
gancio-config-init:
image: ${GITEA_REGISTRY:-gitea.bnkops.com/admin}/alpine:3
container_name: gancio-config-init
restart: "no"
volumes:
- gancio-data:/data
environment:
- GANCIO_BASE_URL=${GANCIO_BASE_URL:-https://events.cmlite.org}
- V2_POSTGRES_USER=${V2_POSTGRES_USER:-changemaker}
- V2_POSTGRES_PASSWORD=${V2_POSTGRES_PASSWORD:?V2_POSTGRES_PASSWORD must be set in .env}
entrypoint: ["sh", "-c"]
command:
- |
set -e
if [ -s /data/config.json ]; then
echo "Gancio config.json present — skipping"
exit 0
fi
echo "Gancio config.json missing — regenerating from .env"
printf '{"baseurl":"%s","server":{"host":"0.0.0.0","port":13120},"db":{"dialect":"postgres","host":"changemaker-v2-postgres","port":5432,"database":"gancio","username":"%s","password":"%s"}}' \
"$$GANCIO_BASE_URL" "$$V2_POSTGRES_USER" "$$V2_POSTGRES_PASSWORD" > /data/config.json
chown 1000:1000 /data/config.json
echo "Gancio config.json regenerated"
logging: *default-logging
networks:
- changemaker-lite
# Gancio — Event management platform (uses shared PostgreSQL)
gancio:
image: ${GITEA_REGISTRY:-gitea.bnkops.com/admin}/gancio:1.28.2
@ -984,6 +1017,8 @@ services:
depends_on:
v2-postgres:
condition: service_healthy
gancio-config-init:
condition: service_completed_successfully
ports:
- "127.0.0.1:${GANCIO_PORT:-8092}:13120"
healthcheck:
@ -1392,9 +1427,10 @@ services:
- /var/run/docker.sock:/var/run/docker.sock
- ccp-agent-data:/var/lib/ccp-agent
- ccp-agent-certs:/etc/ccp-agent
# Mount the instance directory so the agent can read compose files and run
# `docker compose -p <project>` commands against the real project on disk.
- .:/app/instance:ro
# Mount the instance directory so the agent can read compose files and
# write status.json + backups (writable; agent already has docker.sock,
# so file write access is not an additional security escalation).
- .:/app/instance
environment:
- AGENT_PORT=7443
- AGENT_DATA_DIR=/var/lib/ccp-agent
@ -1406,7 +1442,12 @@ services:
- INSTANCE_BASE_PATH=/app/instance
# Pass the host's compose project name so the agent runs `docker compose -p <project>`
# against the right project (not basename of INSTANCE_BASE_PATH, which is "instance").
# COMPOSE_PROJECT is read by the agent's TypeScript for slug derivation;
# COMPOSE_PROJECT_NAME is what Docker Compose itself reads when upgrade.sh
# shells out to `docker compose ...` — without it, compose defaults to
# basename(cwd)="instance" and collides with the host's existing containers.
- COMPOSE_PROJECT=${COMPOSE_PROJECT_NAME:-changemaker-lite}
- COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME:-changemaker-lite}
logging: *default-logging
networks:
- changemaker-lite

View File

@ -998,6 +998,40 @@ services:
start_period: 30s
# Gancio — Event management platform (uses shared PostgreSQL)
# Gancio Config Init — Writes /home/node/data/config.json from .env if missing.
# Gancio refuses to start when its DB has tables but the data volume has no
# config.json ("Non empty db! Please move your current db elsewhere than retry"),
# which causes an infinite restart loop. This sidecar runs on every `up` and is
# a no-op when config.json is already present. Reversible: removing this
# service has no effect on healthy stacks; it only matters when the volume
# loses config.json (volume rename, partial restore, manual volume rm, etc.).
gancio-config-init:
image: alpine:3
container_name: gancio-config-init
restart: "no"
volumes:
- gancio-data:/data
environment:
- GANCIO_BASE_URL=${GANCIO_BASE_URL:-https://events.cmlite.org}
- V2_POSTGRES_USER=${V2_POSTGRES_USER:-changemaker}
- V2_POSTGRES_PASSWORD=${V2_POSTGRES_PASSWORD:?V2_POSTGRES_PASSWORD must be set in .env}
entrypoint: ["sh", "-c"]
command:
- |
set -e
if [ -s /data/config.json ]; then
echo "Gancio config.json present — skipping"
exit 0
fi
echo "Gancio config.json missing — regenerating from .env"
printf '{"baseurl":"%s","server":{"host":"0.0.0.0","port":13120},"db":{"dialect":"postgres","host":"changemaker-v2-postgres","port":5432,"database":"gancio","username":"%s","password":"%s"}}' \
"$$GANCIO_BASE_URL" "$$V2_POSTGRES_USER" "$$V2_POSTGRES_PASSWORD" > /data/config.json
chown 1000:1000 /data/config.json
echo "Gancio config.json regenerated"
logging: *default-logging
networks:
- changemaker-lite
gancio:
image: cisti/gancio:1.28.2
container_name: gancio-changemaker
@ -1005,6 +1039,8 @@ services:
depends_on:
v2-postgres:
condition: service_healthy
gancio-config-init:
condition: service_completed_successfully
ports:
- "127.0.0.1:${GANCIO_PORT:-8092}:13120"
healthcheck:
@ -1414,7 +1450,10 @@ services:
- /var/run/docker.sock:/var/run/docker.sock
- ccp-agent-data:/var/lib/ccp-agent
- ccp-agent-certs:/etc/ccp-agent
- .:/app/instance:ro
# Writable: agent must write data/upgrade/{status,progress,result}.json
# and data/backups/*.tar.gz. Agent already has docker.sock — file write
# access is not an additional security escalation.
- .:/app/instance
environment:
- AGENT_PORT=7443
- AGENT_DATA_DIR=/var/lib/ccp-agent
@ -1426,7 +1465,12 @@ services:
- INSTANCE_BASE_PATH=/app/instance
# Pass the host's compose project name so the agent runs `docker compose -p <project>`
# against the right project (not basename of INSTANCE_BASE_PATH, which is "instance").
# COMPOSE_PROJECT is read by the agent's TypeScript for slug derivation;
# COMPOSE_PROJECT_NAME is what Docker Compose itself reads when upgrade.sh
# shells out to `docker compose ...` — without it, compose defaults to
# basename(cwd)="instance" and collides with the host's existing containers.
- COMPOSE_PROJECT=${COMPOSE_PROJECT_NAME:-changemaker-lite}
- COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME:-changemaker-lite}
logging: *default-logging
networks:
- changemaker-lite

View File

@ -295,12 +295,23 @@ if [[ "$UPLOAD" == "true" ]]; then
fi
fi
# Gitea 1.23.x only initializes Release.CreatedUnix inside its createTag()
# path. If the git tag already exists on origin when we POST /releases,
# createTag() is skipped and CreatedUnix stays 0, which makes /releases/latest
# silently return an older release. Remove the remote tag first so Gitea
# creates it via target_commitish below. The tag is preserved locally and
# gets recreated at the same SHA — no history is lost.
if git ls-remote --exit-code origin "refs/tags/${TAG}" >/dev/null 2>&1; then
warn "Removing remote tag ${TAG} so Gitea can recreate it (CreatedUnix init)"
git push origin ":refs/tags/${TAG}" >/dev/null 2>&1 || true
fi
info "Creating Gitea release ${TAG}..."
RELEASE_RESPONSE=$(curl -sf -X POST \
"${GITEA_HOST}/api/v1/repos/admin/changemaker.lite/releases" \
-H "Authorization: token ${GITEA_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"tag_name\":\"${TAG}\",\"name\":\"Changemaker Lite ${TAG}\",\"body\":\"Release ${TAG} (${COMMIT_SHA})\"}" \
-d "{\"tag_name\":\"${TAG}\",\"target_commitish\":\"${COMMIT_SHA}\",\"name\":\"Changemaker Lite ${TAG}\",\"body\":\"Release ${TAG} (${COMMIT_SHA})\"}" \
2>/dev/null || true)
RELEASE_ID=$(echo "$RELEASE_RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null || true)

View File

@ -188,11 +188,22 @@ restore_user_paths() {
# "Non empty db! Please move your current db elsewhere than retry."
# This regenerates config.json from .env vars when missing.
verify_gancio_config() {
local gancio_volume
gancio_volume="$(docker volume ls --format '{{.Name}}' | grep 'gancio-data' | head -1 || true)"
if [[ -z "$gancio_volume" ]]; then
# Note: as of the gancio-config-init sidecar in docker-compose{,prod}.yml,
# config.json is regenerated automatically on every `up`. This function is
# kept as belt-and-braces for the upgrade flow specifically (e.g. so the
# check happens before the compose-up rather than at compose-up time, and
# so operators see explicit log output during upgrade).
local matches
matches="$(docker volume ls --format '{{.Name}}' | grep 'gancio-data' || true)"
local count
count=$(printf '%s\n' "$matches" | grep -c '.' || true)
if [[ "$count" -eq 0 ]]; then
return # No gancio volume exists yet; first run will handle it
fi
if [[ "$count" -gt 1 ]]; then
error "Multiple gancio-data volumes found — refusing to guess. Resolve manually:\n$matches"
fi
local gancio_volume="$matches"
# Check if config.json exists and is non-empty
if docker run --rm -v "${gancio_volume}:/data" alpine test -s /data/config.json 2>/dev/null; then