bunker-admin 47704667b1 Upgrade failure visibility + atomic VERSION + external smoke test
Three fixes to harden the admin-UI upgrade path, all in scripts/upgrade.sh.
Root-caused by yesterday's v2.9.2 → v2.9.3 on marcelle which was killed by
systemd mid-Phase-4 and left the system in a misleading half-upgraded state
(VERSION bumped, container pre-upgrade, result.json stale from 24h prior).

- Fix A (failure visibility): stop silencing stderr on the five docker
  compose pull sites so timeouts / auth failures / network errors flow
  into upgrade-watcher.log. Add explicit SIGTERM/SIGINT traps alongside
  the existing EXIT trap. Track CURRENT_PHASE_NAME globally so the
  failure message reports "during Phase 4: Container Rebuild" rather
  than just an exit code. Introduce write_result_force (bypasses
  API_MODE guard) + archive_failure_to_history so a killed upgrade
  always leaves a truthful result.json + history.json entry, and the
  progress.json is cleared so the admin UI stops showing a phantom
  in-progress phase.

- Fix B (atomic VERSION): Phase 3 rsync now --excludes VERSION and
  stashes the new one at data/upgrade/VERSION.pending. Phase 7 promotes
  it to VERSION only after VERIFY_FAILED stays false. on_failure deletes
  the pending file. upgrade-check.sh needs no changes — its head -1
  VERSION read sees actual state instead of a mid-upgrade promise.

- Fix C (external smoke): after Phase 7 localhost checks, curl
  https://api.${DOMAIN}/api/health with --max-time 10 and warn (not
  fail) on non-200. Catches Pangolin resource misassignments that the
  localhost-only checks miss. Appends to UPGRADE_WARNINGS so the admin
  UI surfaces it in result.json.

Bunker Admin
2026-04-15 16:13:04 -06:00
..
2026-02-18 17:15:31 -07:00
2026-03-22 21:47:09 -06:00