Three fixes to harden the admin-UI upgrade path, all in scripts/upgrade.sh.
Root-caused by yesterday's v2.9.2 → v2.9.3 on marcelle which was killed by
systemd mid-Phase-4 and left the system in a misleading half-upgraded state
(VERSION bumped, container pre-upgrade, result.json stale from 24h prior).
- Fix A (failure visibility): stop silencing stderr on the five docker
compose pull sites so timeouts / auth failures / network errors flow
into upgrade-watcher.log. Add explicit SIGTERM/SIGINT traps alongside
the existing EXIT trap. Track CURRENT_PHASE_NAME globally so the
failure message reports "during Phase 4: Container Rebuild" rather
than just an exit code. Introduce write_result_force (bypasses
API_MODE guard) + archive_failure_to_history so a killed upgrade
always leaves a truthful result.json + history.json entry, and the
progress.json is cleared so the admin UI stops showing a phantom
in-progress phase.
- Fix B (atomic VERSION): Phase 3 rsync now --excludes VERSION and
stashes the new one at data/upgrade/VERSION.pending. Phase 7 promotes
it to VERSION only after VERIFY_FAILED stays false. on_failure deletes
the pending file. upgrade-check.sh needs no changes — its head -1
VERSION read sees actual state instead of a mid-upgrade promise.
- Fix C (external smoke): after Phase 7 localhost checks, curl
https://api.${DOMAIN}/api/health with --max-time 10 and warn (not
fail) on non-200. Catches Pangolin resource misassignments that the
localhost-only checks miss. Appends to UPGRADE_WARNINGS so the admin
UI surfaces it in result.json.
Bunker Admin