Prior commit (ac901c9e, Fix B) gated VERSION.pending promotion behind
VERIFY_FAILED=false, but VERIFY_FAILED is a soft warning signal — it
also fires when the admin container's 30s verify budget is tight
(which was the cry-wolf case Fix 3 addressed in the same commit).
Observed on marcelle during v2.9.5 → v2.9.6: the upgrade completed
successfully (tarball extracted, containers pulled and running new
image), but because the admin healthcheck warned at 30s (still using
v2.9.5's upgrade.sh with its 30s budget), VERIFY_FAILED=true pinned
VERSION back to v2.9.5 despite everything else having advanced. result.json showed success=true but newCommit=v2.9.5.
Hard failures still prevent promotion via on_failure's rm -f of
VERSION.pending before the promotion site is reached. Reaching the
promotion site means Phase 7 completed without exit-code or trap —
that's the correct gate.
Bunker Admin
Four fixes building on the prior upgrade-path work. All observed on
marcelle across today's v2.9.2 → v2.9.5 cycles and addressed here.
- Fix 1 (breaking-release gate). upgrade-check.sh now parses the first
line of each Gitea release body for `BREAKING: <reason>` and threads
`breaking`/`breakingReason` through status.json into the API status
response. Admin UI renders a red Alert with a typed-tag confirmation
input and gates the Start Upgrade button. auto-upgrade.service.ts
refuses to apply breaking releases, logging a skip and holding off
until the operator confirms manually.
- Fix 2 (release-mode rollback). print_rollback_help and the --rollback
flow both used `git checkout`, which silently fails in release
installs (no .git). Added INSTALL_MODE branches: release mode
downloads the prior tarball from Gitea using a new VERSION.rollback
marker seeded at Phase 3 start. Source mode retains the existing
git-based flow.
- Fix 3 (Phase 7 health budgets). admin verify_service_health budget
30s → 90s (matches the admin container's start_period from commit
47704667). Gancio + MkDocs switched from one-shot to the existing
verify_service_health retry wrapper. Cuts the cry-wolf
"services may still be starting" warning from every upgrade result.
- Fix 4 (symmetric success archival). Bash archive_failure_to_history
already logs failures on exit; added a matching archive_success_to_
history called after write_result on the success path. API-side
archiveResult now dedupes on completedAt so double-recording (bash
+ post-restart handler) can't land twice in history.json.
Release the bundle as v2.9.6.
Bunker Admin
Three fixes to harden the admin-UI upgrade path, all in scripts/upgrade.sh.
Root-caused by yesterday's v2.9.2 → v2.9.3 on marcelle which was killed by
systemd mid-Phase-4 and left the system in a misleading half-upgraded state
(VERSION bumped, container pre-upgrade, result.json stale from 24h prior).
- Fix A (failure visibility): stop silencing stderr on the five docker
compose pull sites so timeouts / auth failures / network errors flow
into upgrade-watcher.log. Add explicit SIGTERM/SIGINT traps alongside
the existing EXIT trap. Track CURRENT_PHASE_NAME globally so the
failure message reports "during Phase 4: Container Rebuild" rather
than just an exit code. Introduce write_result_force (bypasses
API_MODE guard) + archive_failure_to_history so a killed upgrade
always leaves a truthful result.json + history.json entry, and the
progress.json is cleared so the admin UI stops showing a phantom
in-progress phase.
- Fix B (atomic VERSION): Phase 3 rsync now --excludes VERSION and
stashes the new one at data/upgrade/VERSION.pending. Phase 7 promotes
it to VERSION only after VERIFY_FAILED stays false. on_failure deletes
the pending file. upgrade-check.sh needs no changes — its head -1
VERSION read sees actual state instead of a mid-upgrade promise.
- Fix C (external smoke): after Phase 7 localhost checks, curl
https://api.${DOMAIN}/api/health with --max-time 10 and warn (not
fail) on non-200. Catches Pangolin resource misassignments that the
localhost-only checks miss. Appends to UPGRADE_WARNINGS so the admin
UI surfaces it in result.json.
Bunker Admin
Six independent fixes surfaced during the v2.9.1 → v2.9.2 admin-UI
upgrade validation today. Together they make a clean install on a new
box work end-to-end without in-session patching.
- Fix 1: scripts/validate-compose-parity.sh + build-release.sh hook —
fail release builds when api/admin/media-api/nginx healthcheck
blocks drift between docker-compose.yml and docker-compose.prod.yml.
Previous boot-race fix had to be applied to both files manually.
- Fix 2: scripts/systemd/install.sh chowns logs/ to the install user
(the API container creates subdirs there as root, locking the
host-side watcher out), pre-creates logs/upgrade-watcher.log, and
changemaker-upgrade.service adds StartLimitIntervalSec=0 so a
single transient failure can't wedge the .path unit permanently.
- Fix 3: /api/upgrade/status now returns a `watcher` sub-object that
flags the host systemd watcher as stalled when trigger.json has
been pending >30s. Admin SettingsPage SystemUpgradeTab renders a
warning Alert with the systemctl recovery command when unhealthy.
- Fix 4: scripts/upgrade.sh write_result() — prefer head -1 VERSION
over `git rev-parse HEAD` so release-mode upgrades report the new
tag in result.json instead of "unknown".
- Fix 5: admin container healthcheck start_period 20s → 60s in both
compose files, same class as the earlier api fix. Matches Gancio
convention.
- Fix 7: /api/pangolin/sync now detects resources bound to a stale
siteId (common after --pangolin-site new rotations), deletes and
recreates them against the current site, and reports them under
a new `reassigned` response field.
Bunker Admin
Major additions: onboarding tour system, correlation-id middleware, media
error handler, restore script, env validation script, Dockerignore files.
Updates across 70+ admin components for improved UX and error handling.
Bunker Admin
Drop the custom Dockerfile.code-server that bundled Claude Code CLI,
Python/MkDocs tooling, and build-essential on top of codercom base.
Switch to the already-mirrored linuxserver/code-server image instead.
- Both compose files: use code-server:latest, LinuxServer env vars
(PUID/PGID/DEFAULT_WORKSPACE), port 8443, /config mount layout
- Nginx configs + templates: proxy to :8443 instead of :8080
- API env default: CODE_SERVER_URL updated to :8443
- build-and-push.sh: remove --include-code-server flag
- upgrade.sh: remove code-server conditional rebuild + registry fallback
- install.sh: add --ignore-pull-failures for optional missing images
- .env.example, CCP templates, bunker-ops template: updated
Bunker Admin
When --use-registry is set, the upgrade script tries to pull images
tagged with the current HEAD SHA. If images were built at an earlier
commit, that SHA tag won't exist. Now tries :latest before falling
back to a full source build. Also applies to nginx and code-server.
Bunker Admin
New install method: curl one-liner downloads a lightweight release
tarball (~9 MB) and runs the config wizard. No git clone needed,
no TypeScript compilation — pulls pre-built images from Gitea registry.
- docker-compose.prod.yml: production compose without build blocks or
source code volume mounts; IMAGE_TAG defaults to latest
- scripts/install.sh: curl-friendly installer (downloads tarball,
extracts, runs config.sh)
- scripts/build-release.sh: creates release tarball from dev repo
with only runtime files (configs, scripts, docs, empty data dirs)
- config.sh: release-mode detection (VERSION file + no .git dir),
auto-sets IMAGE_TAG=latest and NODE_ENV=production
- upgrade.sh: release-mode upgrade path (downloads new tarball from
Gitea Releases API instead of git pull, always uses registry mode)
- upgrade-check.sh: release-mode version check via Gitea API
- .gitignore: exclude releases/ and api/dist/
- Docs: updated getting-started with pre-built install instructions
Bunker Admin
- Fix @/utils/logger path alias (tsc doesn't transform @/ in output)
- Add JWT_INVITE_SECRET to media-api compose environment block
- Fix redis-exporter depends_on to use service name not container name
- Fix upgrade.sh: restore tracked files deleted by restore_user_paths
- Add scripts/build-and-push.sh for building + pushing production images
- Add scripts/mirror-images.sh for mirroring third-party images
Bunker Admin
Inserts Phase 5 (Database Migration) between container rebuild and service
restart. Detects failed/incomplete Prisma migrations via _prisma_migrations
query and auto-resolves them before running migrate deploy in a one-off
container — catching errors in the script rather than letting the API enter
a restart loop. Also detects when package.json/package-lock.json changed
and removes old API/admin containers to prevent stale anonymous volumes
from shadowing updated node_modules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Resolve Pangolin site slug to numeric ID in sync route (fixes target creation 400 errors)
- Disable SSO on newly created Pangolin resources for public access
- Fix nginx media API proxy: use rewrite + set ordering for proper URI rewriting
- Upgrade script: clear skip-worktree flags, fix Docker-owned dir permissions, stash untracked files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
API container writes trigger files to a shared volume (data/upgrade/),
and a systemd path watcher on the host detects them and runs the
upgrade scripts. This avoids giving the container Docker socket access.
- Add upgrade-check.sh (git fetch + compare + write status.json)
- Add upgrade-watcher.sh (systemd bridge, dispatches check/upgrade)
- Add systemd path/service units with placeholder substitution
- Modify upgrade.sh with --api-mode flag (progress.json + result.json)
- Add API upgrade module (service + routes, SUPER_ADMIN only)
- Add System tab to Settings page with version info, changelog,
progress steps, and upgrade confirmation modal
- Add upgrade watcher installation to config.sh wizard
- Add data/upgrade/ shared volume to api service in docker-compose
Bunker Admin
Two issues occurred during upgrades:
1. Gancio config.json lost when Docker volume name prefix changes
(e.g., changemakerlite_ vs changemaker-lite_). Gancio finds existing
DB but no config and enters restart loop. Fix: verify_gancio_config()
checks the volume and regenerates config.json from .env if missing.
2. mkdocs-site-server (LSIO nginx) returns 403 after upgrade because
the anonymous /config volume shadows the ./mkdocs/site bind mount.
Fix: docker compose rm -sf the LSIO container before up -d so the
anonymous volume is recreated fresh.
Also adds Gancio and MkDocs site health checks to Phase 6 verification.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>