Compare commits

..

5 Commits

Author SHA1 Message Date
9613c3ec81 fix(upgrade): Phase 1 of upgrade-flow redesign (Approach A)
Three coordinated fixes from the upgrade-flow redesign plan
(/home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md):

1. scripts/lib/mkdocs-snapshot.sh (NEW): pre-upgrade tarball snapshot of
   the entire mkdocs/ directory into the install root as
   mkdocs-backup-<timestamp>.tar.gz. Discoverable via `ls`, retained last 5.
   No-regrets fallback if anything in the upgrade goes sideways. Sourced
   by upgrade.sh (and later by image-upgrade.sh under Approach B).

2. scripts/upgrade.sh Phase 6 self-destruct fix: previously, the broad
   `docker compose up -d` recreated the ccp-agent container that was
   running the script, sending SIGKILL to the bash process before
   write_result could land result.json. Marcelle's test upgrade hit this
   tonight. Fix: temporarily remove `ccp-agent` from COMPOSE_PROFILES
   during Phase 6's broad up -d, then schedule a detached `nohup ... &
   disown` restart at the very end of the script (after write_result and
   archive_success_to_history). The deferred subshell sleeps 3s, then
   recreates ccp-agent under its profile, picking up the new image.

3. scripts/upgrade-stash-cleanup.sh (NEW): one-shot utility to list and
   drop accumulated `upgrade-*` git stashes left over by older upgrade.sh
   runs whose pop failed silently (Pride Corner has three from 2026-03-09
   alone). Warns loudly if any stash holds tenant mkdocs.yml content so
   operators verify recovery before dropping.

The .gitignore now excludes /mkdocs-backup-*.tar.gz so the rescue
archives don't leak into commits.

This is Phase 1 of three: Approach B (image-only upgrade mode) and
Approach C (CCP template re-render) follow in subsequent commits.

Bunker Admin
2026-05-20 20:43:34 -06:00
e88ac79ae8 fix(ccp-agent): export COMPOSE_PROJECT_NAME so upgrade.sh sees correct project
The agent already passed COMPOSE_PROJECT in env, but Docker Compose actually
reads COMPOSE_PROJECT_NAME. When upgrade.sh (running inside the agent
container at cwd=/app/instance) shelled out to `docker compose up -d` in
Phase 5, compose defaulted the project name to "instance" (cwd basename),
collided with the host's existing containers under "changemakerlite", and
the upgrade aborted with "Container ... already in use by container ..."
errors.

Discovered when triggering the first end-to-end CCP "Upgrade Now" on
marcelle (v2.9.15 → v2.10.1). Backup/code/rebuild phases all succeeded;
migration phase failed instantly. Rollback restored marcelle cleanly.

This commit adds COMPOSE_PROJECT_NAME alongside the existing COMPOSE_PROJECT
(which the agent's TypeScript still reads for its own slug derivation).

Bunker Admin
2026-05-20 15:57:30 -06:00
1b80e8294c fix(ccp-agent): whitelist /app/instance for git safe.directory
The agent container runs as root but the bind-mounted instance directory
is owned by the host user (UID 1000 = `node` in the container). Modern
git refuses to operate on such repos without an explicit safe.directory
entry, breaking upgrade-check.sh's `git fetch/log` calls on source-installed
tenants. Verified empirically on soroush after the previous fix landed.

Bunker Admin
2026-05-20 12:14:39 -06:00
a531f9b9ce fix(ccp): make agent functional + fix Gitea release timestamp bug
Three related fixes uncovered during a marcelle CCP registration test:

1. ccp-agent image was missing bash + curl + jq + python3, so every
   spawn('bash', ...) in upgrade.routes.ts and backup.routes.ts failed
   silently with ENOENT. CCP kept reading stale status.json files from
   disk, masking that no agent had successfully checked for updates in
   weeks. apk-add the missing tools.

2. ccp-agent's /app/instance mount was :ro, blocking the agent from
   writing data/upgrade/status.json (and result/progress/backups).
   Agent already has docker.sock — removing :ro is not a security
   escalation. Patched both docker-compose.yml and docker-compose.prod.yml.

3. Gitea 1.23.x only initializes Release.CreatedUnix inside its
   createTag() helper, which is skipped if the tag already exists on
   origin. The old DEV_WORKFLOW pattern (push tag, then run
   build-release.sh --upload) was triggering this — releases got
   created_unix=0 and lost /releases/latest sort order to v2.9.14.
   build-release.sh now removes the remote tag first and POSTs with
   target_commitish so Gitea creates the tag and release atomically.

After these fixes, CCP's "Check for Updates" path returns truthful
data end-to-end (verified on marcelle: v2.9.15 -> v2.10.1, 1 behind).

Bunker Admin
2026-05-20 11:59:35 -06:00
a82e95946b fix(gancio): pre-start config-init sidecar prevents restart loop
Gancio refuses to start when its DB has tables but the data volume has no
config.json ("Non empty db! Please move your current db elsewhere than retry"),
which produces an infinite restart loop. This hit production tenants bnkops
and trbh (>1200 restart cycles each) — proximate cause was a missing
config.json in changemakerlite_gancio-data with the DB fully populated.

Add gancio-config-init alpine sidecar that runs on every `up`:
  - no-op when config.json exists
  - regenerates from .env when missing (1000:1000 ownership)
  - gancio service now depends on its service_completed_successfully

Also harden verify_gancio_config in upgrade.sh to error loudly when
multiple gancio-data volumes match (silent head -1 could pick the wrong
one after a compose project rename).
2026-05-19 17:02:55 -06:00
8 changed files with 400 additions and 11 deletions

5
.gitignore vendored
View File

@ -64,6 +64,11 @@ core.*
/backups/
.upgrade.lock
# Pre-upgrade mkdocs snapshots (created by scripts/lib/mkdocs-snapshot.sh).
# These are the tenant-content rescue archives written before every upgrade;
# discoverable in the install root via `ls`. Retention: last 5 (see helper).
/mkdocs-backup-*.tar.gz
# Release tarballs (generated by build-release.sh)
/releases/

View File

@ -8,7 +8,16 @@ COPY src/ ./src/
RUN npx tsc
FROM node:20-alpine
RUN apk add --no-cache docker-cli docker-cli-compose git rsync
# bash + curl + jq + python3 are required by the changemaker scripts the agent
# shells out to (upgrade-check.sh, upgrade.sh, backup.sh). Without them, every
# /upgrade/* and /backup/* call returns "command not found" failures.
RUN apk add --no-cache docker-cli docker-cli-compose git rsync bash curl jq python3
# Agent runs as root, but the bind-mounted /app/instance is owned by the host
# user (UID 1000 = `node` inside the container). Modern git refuses to operate
# on repos with mismatched ownership without an explicit safe.directory entry.
# Wildcard whitelist all paths — the agent only mounts a single host directory
# anyway (the instance's project root).
RUN git config --system --add safe.directory '*'
WORKDIR /app
COPY package*.json ./
RUN npm ci --production

View File

@ -976,6 +976,39 @@ services:
retries: 10
start_period: 30s
# Gancio Config Init — Writes /home/node/data/config.json from .env if missing.
# Gancio refuses to start when its DB has tables but the data volume has no
# config.json ("Non empty db! Please move your current db elsewhere than retry"),
# which causes an infinite restart loop. This sidecar runs on every `up` and is
# a no-op when config.json is already present. See docker-compose.yml for the
# full rationale; the two files must stay in parity per scripts/validate-compose-parity.sh.
gancio-config-init:
image: ${GITEA_REGISTRY:-gitea.bnkops.com/admin}/alpine:3
container_name: gancio-config-init
restart: "no"
volumes:
- gancio-data:/data
environment:
- GANCIO_BASE_URL=${GANCIO_BASE_URL:-https://events.cmlite.org}
- V2_POSTGRES_USER=${V2_POSTGRES_USER:-changemaker}
- V2_POSTGRES_PASSWORD=${V2_POSTGRES_PASSWORD:?V2_POSTGRES_PASSWORD must be set in .env}
entrypoint: ["sh", "-c"]
command:
- |
set -e
if [ -s /data/config.json ]; then
echo "Gancio config.json present — skipping"
exit 0
fi
echo "Gancio config.json missing — regenerating from .env"
printf '{"baseurl":"%s","server":{"host":"0.0.0.0","port":13120},"db":{"dialect":"postgres","host":"changemaker-v2-postgres","port":5432,"database":"gancio","username":"%s","password":"%s"}}' \
"$$GANCIO_BASE_URL" "$$V2_POSTGRES_USER" "$$V2_POSTGRES_PASSWORD" > /data/config.json
chown 1000:1000 /data/config.json
echo "Gancio config.json regenerated"
logging: *default-logging
networks:
- changemaker-lite
# Gancio — Event management platform (uses shared PostgreSQL)
gancio:
image: ${GITEA_REGISTRY:-gitea.bnkops.com/admin}/gancio:1.28.2
@ -984,6 +1017,8 @@ services:
depends_on:
v2-postgres:
condition: service_healthy
gancio-config-init:
condition: service_completed_successfully
ports:
- "127.0.0.1:${GANCIO_PORT:-8092}:13120"
healthcheck:
@ -1392,9 +1427,10 @@ services:
- /var/run/docker.sock:/var/run/docker.sock
- ccp-agent-data:/var/lib/ccp-agent
- ccp-agent-certs:/etc/ccp-agent
# Mount the instance directory so the agent can read compose files and run
# `docker compose -p <project>` commands against the real project on disk.
- .:/app/instance:ro
# Mount the instance directory so the agent can read compose files and
# write status.json + backups (writable; agent already has docker.sock,
# so file write access is not an additional security escalation).
- .:/app/instance
environment:
- AGENT_PORT=7443
- AGENT_DATA_DIR=/var/lib/ccp-agent
@ -1406,7 +1442,12 @@ services:
- INSTANCE_BASE_PATH=/app/instance
# Pass the host's compose project name so the agent runs `docker compose -p <project>`
# against the right project (not basename of INSTANCE_BASE_PATH, which is "instance").
# COMPOSE_PROJECT is read by the agent's TypeScript for slug derivation;
# COMPOSE_PROJECT_NAME is what Docker Compose itself reads when upgrade.sh
# shells out to `docker compose ...` — without it, compose defaults to
# basename(cwd)="instance" and collides with the host's existing containers.
- COMPOSE_PROJECT=${COMPOSE_PROJECT_NAME:-changemaker-lite}
- COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME:-changemaker-lite}
logging: *default-logging
networks:
- changemaker-lite

View File

@ -998,6 +998,40 @@ services:
start_period: 30s
# Gancio — Event management platform (uses shared PostgreSQL)
# Gancio Config Init — Writes /home/node/data/config.json from .env if missing.
# Gancio refuses to start when its DB has tables but the data volume has no
# config.json ("Non empty db! Please move your current db elsewhere than retry"),
# which causes an infinite restart loop. This sidecar runs on every `up` and is
# a no-op when config.json is already present. Reversible: removing this
# service has no effect on healthy stacks; it only matters when the volume
# loses config.json (volume rename, partial restore, manual volume rm, etc.).
gancio-config-init:
image: alpine:3
container_name: gancio-config-init
restart: "no"
volumes:
- gancio-data:/data
environment:
- GANCIO_BASE_URL=${GANCIO_BASE_URL:-https://events.cmlite.org}
- V2_POSTGRES_USER=${V2_POSTGRES_USER:-changemaker}
- V2_POSTGRES_PASSWORD=${V2_POSTGRES_PASSWORD:?V2_POSTGRES_PASSWORD must be set in .env}
entrypoint: ["sh", "-c"]
command:
- |
set -e
if [ -s /data/config.json ]; then
echo "Gancio config.json present — skipping"
exit 0
fi
echo "Gancio config.json missing — regenerating from .env"
printf '{"baseurl":"%s","server":{"host":"0.0.0.0","port":13120},"db":{"dialect":"postgres","host":"changemaker-v2-postgres","port":5432,"database":"gancio","username":"%s","password":"%s"}}' \
"$$GANCIO_BASE_URL" "$$V2_POSTGRES_USER" "$$V2_POSTGRES_PASSWORD" > /data/config.json
chown 1000:1000 /data/config.json
echo "Gancio config.json regenerated"
logging: *default-logging
networks:
- changemaker-lite
gancio:
image: cisti/gancio:1.28.2
container_name: gancio-changemaker
@ -1005,6 +1039,8 @@ services:
depends_on:
v2-postgres:
condition: service_healthy
gancio-config-init:
condition: service_completed_successfully
ports:
- "127.0.0.1:${GANCIO_PORT:-8092}:13120"
healthcheck:
@ -1414,7 +1450,10 @@ services:
- /var/run/docker.sock:/var/run/docker.sock
- ccp-agent-data:/var/lib/ccp-agent
- ccp-agent-certs:/etc/ccp-agent
- .:/app/instance:ro
# Writable: agent must write data/upgrade/{status,progress,result}.json
# and data/backups/*.tar.gz. Agent already has docker.sock — file write
# access is not an additional security escalation.
- .:/app/instance
environment:
- AGENT_PORT=7443
- AGENT_DATA_DIR=/var/lib/ccp-agent
@ -1426,7 +1465,12 @@ services:
- INSTANCE_BASE_PATH=/app/instance
# Pass the host's compose project name so the agent runs `docker compose -p <project>`
# against the right project (not basename of INSTANCE_BASE_PATH, which is "instance").
# COMPOSE_PROJECT is read by the agent's TypeScript for slug derivation;
# COMPOSE_PROJECT_NAME is what Docker Compose itself reads when upgrade.sh
# shells out to `docker compose ...` — without it, compose defaults to
# basename(cwd)="instance" and collides with the host's existing containers.
- COMPOSE_PROJECT=${COMPOSE_PROJECT_NAME:-changemaker-lite}
- COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME:-changemaker-lite}
logging: *default-logging
networks:
- changemaker-lite

View File

@ -295,12 +295,23 @@ if [[ "$UPLOAD" == "true" ]]; then
fi
fi
# Gitea 1.23.x only initializes Release.CreatedUnix inside its createTag()
# path. If the git tag already exists on origin when we POST /releases,
# createTag() is skipped and CreatedUnix stays 0, which makes /releases/latest
# silently return an older release. Remove the remote tag first so Gitea
# creates it via target_commitish below. The tag is preserved locally and
# gets recreated at the same SHA — no history is lost.
if git ls-remote --exit-code origin "refs/tags/${TAG}" >/dev/null 2>&1; then
warn "Removing remote tag ${TAG} so Gitea can recreate it (CreatedUnix init)"
git push origin ":refs/tags/${TAG}" >/dev/null 2>&1 || true
fi
info "Creating Gitea release ${TAG}..."
RELEASE_RESPONSE=$(curl -sf -X POST \
"${GITEA_HOST}/api/v1/repos/admin/changemaker.lite/releases" \
-H "Authorization: token ${GITEA_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"tag_name\":\"${TAG}\",\"name\":\"Changemaker Lite ${TAG}\",\"body\":\"Release ${TAG} (${COMMIT_SHA})\"}" \
-d "{\"tag_name\":\"${TAG}\",\"target_commitish\":\"${COMMIT_SHA}\",\"name\":\"Changemaker Lite ${TAG}\",\"body\":\"Release ${TAG} (${COMMIT_SHA})\"}" \
2>/dev/null || true)
RELEASE_ID=$(echo "$RELEASE_RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null || true)

81
scripts/lib/mkdocs-snapshot.sh Executable file
View File

@ -0,0 +1,81 @@
#!/usr/bin/env bash
# =============================================================================
# mkdocs-snapshot.sh — shared library function
# =============================================================================
# Defines snapshot_mkdocs(): writes a tarball of mkdocs/ into the install root
# as mkdocs-backup-<timestamp>.tar.gz, keeping the last 5 snapshots.
#
# Sourced by scripts/upgrade.sh and scripts/image-upgrade.sh (and may be
# invoked agent-side by changemaker-control-panel during template re-render).
#
# Why the install root instead of backups/?
# - Discoverable: operators see mkdocs-backup-*.tar.gz with a plain `ls`.
# - The agent's /app/instance bind mount maps directly to the install root,
# so the agent can restore from this archive without path translation.
# - backups/ is owned by root in some installs (DB dumps via container)
# and gets rotated on a different schedule than docs snapshots.
#
# Restoration one-liner:
# tar xzf "$(ls -t mkdocs-backup-*.tar.gz | head -1)" -C . \
# && docker compose restart mkdocs mkdocs-site-server
#
# Requires: $PROJECT_DIR (absolute path to install root), info() function
# from the caller (falls back to plain echo if info is not defined).
# =============================================================================
# Fallback log function if caller didn't define one (e.g. when sourcing standalone)
if ! declare -F info >/dev/null 2>&1; then
info() { echo "[INFO] $*"; }
fi
if ! declare -F warn >/dev/null 2>&1; then
warn() { echo "[WARN] $*" >&2; }
fi
# snapshot_mkdocs — take a tarball of mkdocs/ into the install root.
#
# Returns 0 if successful (or if mkdocs/ doesn't exist — non-fatal).
# Returns non-zero only if tar itself fails AND $SNAPSHOT_REQUIRED is true.
#
# Optional env vars:
# PROJECT_DIR (required) Install root containing mkdocs/
# SNAPSHOT_KEEP Number of snapshots to retain (default 5)
# SNAPSHOT_REQUIRED If "true", failure to snapshot aborts (default false)
snapshot_mkdocs() {
if [[ -z "${PROJECT_DIR:-}" ]]; then
warn "snapshot_mkdocs: PROJECT_DIR not set; skipping"
return 0
fi
if [[ ! -d "${PROJECT_DIR}/mkdocs" ]]; then
# No mkdocs dir = nothing to snapshot. Common on minimal installs.
return 0
fi
local stamp
stamp="$(date +%Y%m%d_%H%M%S)"
local archive="${PROJECT_DIR}/mkdocs-backup-${stamp}.tar.gz"
local keep="${SNAPSHOT_KEEP:-5}"
if tar czf "$archive" -C "$PROJECT_DIR" mkdocs 2>/dev/null; then
local size
size="$(du -h "$archive" 2>/dev/null | cut -f1)"
info "Tenant docs snapshot: $(basename "$archive") (${size})"
else
warn "snapshot_mkdocs: tar failed for $archive"
rm -f "$archive" 2>/dev/null
if [[ "${SNAPSHOT_REQUIRED:-false}" == "true" ]]; then
return 1
fi
return 0
fi
# Retention: keep the most recent N snapshots, prune older ones.
# ls -t lists newest first; tail -n +N+1 selects items after the Nth.
local prune_from=$((keep + 1))
# shellcheck disable=SC2012 # ls is intentional for mtime sort
ls -t "${PROJECT_DIR}"/mkdocs-backup-*.tar.gz 2>/dev/null \
| tail -n +${prune_from} \
| xargs -r rm -f
return 0
}

135
scripts/upgrade-stash-cleanup.sh Executable file
View File

@ -0,0 +1,135 @@
#!/usr/bin/env bash
# =============================================================================
# upgrade-stash-cleanup.sh — clean up stale upgrade-* git stashes
# =============================================================================
# Older versions of upgrade.sh used `git stash push --include-untracked` to
# protect tenant content during pulls. When pop conflicts went unresolved,
# the stashes accumulated in `git stash list` forever — Pride Corner ended up
# with three from 2026-03-09 alone, each containing displaced tenant
# customizations that the running site no longer reflected.
#
# This script lists every `upgrade-*` stash, shows its scope, and offers to
# drop them. It does NOT auto-restore content; that's a separate decision per
# tenant. The intent is to clear the backlog so future `git stash list` is
# meaningful.
#
# Usage:
# bash scripts/upgrade-stash-cleanup.sh # interactive, lists + prompts
# bash scripts/upgrade-stash-cleanup.sh --dry # list only
# bash scripts/upgrade-stash-cleanup.sh --yes # drop all upgrade-* without prompt
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
cd "$PROJECT_DIR"
# Colors
if [[ -t 1 ]] && [[ -z "${NO_COLOR:-}" ]]; then
RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[0;33m' CYAN='\033[0;36m'
BOLD='\033[1m' NC='\033[0m'
else
RED='' GREEN='' YELLOW='' CYAN='' BOLD='' NC=''
fi
info() { echo -e "${CYAN}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[ OK ]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
DRY=false
YES=false
for arg in "$@"; do
case "$arg" in
--dry|--dry-run) DRY=true ;;
--yes|-y) YES=true ;;
--help|-h)
sed -n '2,/^# =====/p' "$0" | sed -n '2,/^# =====/p' | sed 's/^# //;s/^#//'
exit 0
;;
esac
done
if [[ ! -d .git ]]; then
warn "Not a git repository — this script only applies to source installs."
exit 0
fi
# Collect upgrade-* stash refs
mapfile -t STASHES < <(git stash list 2>/dev/null | grep -E ': (On|WIP on) [^:]+: upgrade-' || true)
if [[ ${#STASHES[@]} -eq 0 ]]; then
ok "No upgrade-* stashes found. Nothing to clean up."
exit 0
fi
echo ""
echo -e "${BOLD}Found ${#STASHES[@]} upgrade-* stash(es):${NC}"
echo ""
for entry in "${STASHES[@]}"; do
REF="${entry%%:*}"
LABEL="${entry#*: }"
FILE_COUNT=$(git stash show "$REF" --name-only 2>/dev/null | wc -l)
HAS_MKDOCS_YML=$(git stash show "$REF" --name-only 2>/dev/null | grep -c '^mkdocs/mkdocs\.yml$' || true)
printf " %-12s %-50s files=%-4d mkdocs.yml=%s\n" \
"$REF" "$LABEL" "$FILE_COUNT" "$HAS_MKDOCS_YML"
done
echo ""
if [[ "$DRY" == "true" ]]; then
info "Dry-run: no stashes will be dropped."
exit 0
fi
# Warn loudly if any stash holds mkdocs.yml — operator should manually review
# before dropping (tenant content might be there).
MKDOCS_STASHES=$(printf '%s\n' "${STASHES[@]}" \
| while read -r entry; do
REF="${entry%%:*}"
if git stash show "$REF" --name-only 2>/dev/null | grep -q '^mkdocs/mkdocs\.yml$'; then
echo "$REF"
fi
done)
if [[ -n "$MKDOCS_STASHES" ]]; then
echo ""
echo -e "${RED}${BOLD}⚠ WARNING:${NC} the following stashes contain ${BOLD}mkdocs/mkdocs.yml${NC}:"
echo "$MKDOCS_STASHES" | sed 's/^/ /'
echo ""
echo " These may hold tenant branding (site_name, site_url, custom theme, etc.)"
echo " that ISN'T reflected on disk. Before dropping, verify:"
echo ""
echo " git show <stash-ref>:mkdocs/mkdocs.yml | head -10"
echo " diff <(git show <stash-ref>:mkdocs/mkdocs.yml) mkdocs/mkdocs.yml"
echo ""
echo " If disk mkdocs.yml already has the tenant content, the stash is safe to drop."
echo " If disk is upstream and stash has tenant content, restore first:"
echo " git checkout <stash-ref> -- mkdocs/mkdocs.yml"
echo ""
fi
if [[ "$YES" != "true" ]]; then
echo -en "${BOLD}Drop all ${#STASHES[@]} upgrade-* stashes? [y/N] ${NC}"
read -r CONFIRM
case "$CONFIRM" in
y|Y|yes|YES) ;;
*) info "Cancelled. No stashes dropped."; exit 0 ;;
esac
fi
# Drop in reverse order so indices stay stable
mapfile -t SORTED_REFS < <(printf '%s\n' "${STASHES[@]}" \
| sed 's/:.*//' \
| sort -t'{' -k2 -n -r)
for REF in "${SORTED_REFS[@]}"; do
if git stash drop "$REF" >/dev/null 2>&1; then
ok "Dropped $REF"
else
warn "Failed to drop $REF (already gone?)"
fi
done
echo ""
ok "Cleanup complete. Remaining stashes:"
git stash list 2>/dev/null || echo " (none)"

View File

@ -95,6 +95,14 @@ phase() {
echo ""
}
# Pre-upgrade tenant docs snapshot (no-regrets fallback). Sourced regardless
# of install mode so snapshot_mkdocs is available in Phase 2.
# shellcheck source=lib/mkdocs-snapshot.sh
if [[ -f "$SCRIPT_DIR/lib/mkdocs-snapshot.sh" ]]; then
# shellcheck disable=SC1091
. "$SCRIPT_DIR/lib/mkdocs-snapshot.sh"
fi
# --- API mode: JSON progress/result writing ---
UPGRADE_DIR="${PROJECT_DIR}/data/upgrade"
PROGRESS_FILE="${UPGRADE_DIR}/progress.json"
@ -188,11 +196,22 @@ restore_user_paths() {
# "Non empty db! Please move your current db elsewhere than retry."
# This regenerates config.json from .env vars when missing.
verify_gancio_config() {
local gancio_volume
gancio_volume="$(docker volume ls --format '{{.Name}}' | grep 'gancio-data' | head -1 || true)"
if [[ -z "$gancio_volume" ]]; then
# Note: as of the gancio-config-init sidecar in docker-compose{,prod}.yml,
# config.json is regenerated automatically on every `up`. This function is
# kept as belt-and-braces for the upgrade flow specifically (e.g. so the
# check happens before the compose-up rather than at compose-up time, and
# so operators see explicit log output during upgrade).
local matches
matches="$(docker volume ls --format '{{.Name}}' | grep 'gancio-data' || true)"
local count
count=$(printf '%s\n' "$matches" | grep -c '.' || true)
if [[ "$count" -eq 0 ]]; then
return # No gancio volume exists yet; first run will handle it
fi
if [[ "$count" -gt 1 ]]; then
error "Multiple gancio-data volumes found — refusing to guess. Resolve manually:\n$matches"
fi
local gancio_volume="$matches"
# Check if config.json exists and is non-empty
if docker run --rm -v "${gancio_volume}:/data" alpine test -s /data/config.json 2>/dev/null; then
@ -698,6 +717,18 @@ fi
phase "2" "Backup"
write_progress 2 "Backup" 15 "Creating backup..."
# Pre-upgrade tenant docs snapshot — the no-regrets fallback. Runs even when
# --skip-backup is set, because this is for tenant content recovery (not DB
# state) and is fast enough that skipping it would never be intentional. It
# lives in the install root (not backups/) so operators discover it via `ls`.
if declare -F snapshot_mkdocs >/dev/null 2>&1; then
if [[ "$DRY_RUN" == "true" ]]; then
info "[DRY RUN] Would snapshot mkdocs/ to ${PROJECT_DIR}/mkdocs-backup-*.tar.gz"
else
snapshot_mkdocs || warn "mkdocs snapshot failed (non-fatal; continuing)"
fi
fi
if [[ "$SKIP_BACKUP" == "true" ]]; then
warn "Backup skipped (--skip-backup --force)"
else
@ -1273,13 +1304,24 @@ while true; do
done
success "API healthy (${API_WAIT}s)"
# Start everything else (exclude one-shot init containers)
# Start everything else (exclude one-shot init containers AND the ccp-agent
# service that's running this very script). Recreating ccp-agent here would
# SIGKILL the script process before write_result has a chance to run; we
# instead schedule a detached restart at the very end of the script.
#
# Mechanism: temporarily drop "ccp-agent" from COMPOSE_PROFILES so the broad
# `up -d` doesn't include it. We re-add it only when scheduling the deferred
# restart so the new agent comes up under its profile.
info "Starting remaining services..."
PROFILES_SAVED="${COMPOSE_PROFILES:-}"
COMPOSE_PROFILES_WITHOUT_AGENT="$(echo "${PROFILES_SAVED}" \
| tr ',' '\n' | grep -vx 'ccp-agent' | paste -sd, -)"
COMPOSE_PROFILES="${COMPOSE_PROFILES_WITHOUT_AGENT}" \
docker compose up -d \
--scale listmonk-init=0 \
--scale gancio-init=0 \
--scale vaultwarden-init=0
success "All services started"
success "All services started (ccp-agent restart deferred to end-of-script)"
# Restart Pangolin tunnel connector if running (may hold stale state after nginx rebuild)
if docker ps --format '{{.Names}}' | grep -q 'newt'; then
@ -1450,6 +1492,27 @@ echo -e " ${BOLD}Duration:${NC} $ELAPSED"
echo -e " ${BOLD}Log:${NC} $LOG_FILE"
echo ""
# Deferred ccp-agent restart — the LAST thing the script does before exit.
# This must run AFTER write_result and archive_success_to_history so the new
# agent comes up to a complete result.json (otherwise CCP polls forever).
# We launch a detached subshell that:
# 1. Sleeps briefly so this script has time to exit cleanly first.
# 2. Restarts ccp-agent under its profile, picking up any new image.
# `nohup` + `disown` ensures the subshell survives the agent container dying
# (when ccp-agent is recreated, the parent agent process — which spawned this
# upgrade.sh — gets SIGKILL'd; the disowned subshell is reparented to PID 1
# on the host and continues).
if echo "${PROFILES_SAVED:-}" | tr ',' '\n' | grep -qx 'ccp-agent'; then
info "Scheduling deferred ccp-agent restart..."
nohup bash -c "
sleep 3
cd '$PROJECT_DIR'
COMPOSE_PROFILES='ccp-agent' docker compose --profile ccp-agent up -d ccp-agent
" >/dev/null 2>&1 < /dev/null &
disown
success "ccp-agent restart scheduled (will pick up new image)"
fi
release_lock
trap - EXIT