Fix VERSION promotion regression: don't gate on soft health-check warnings

Prior commit (ac901c9e, Fix B) gated VERSION.pending promotion behind
VERIFY_FAILED=false, but VERIFY_FAILED is a soft warning signal — it
also fires when the admin container's 30s verify budget is tight
(which was the cry-wolf case Fix 3 addressed in the same commit).

Observed on marcelle during v2.9.5 → v2.9.6: the upgrade completed
successfully (tarball extracted, containers pulled and running new
image), but because the admin healthcheck warned at 30s (still using
v2.9.5's upgrade.sh with its 30s budget), VERIFY_FAILED=true pinned
VERSION back to v2.9.5 despite everything else having advanced. result.json showed success=true but newCommit=v2.9.5.

Hard failures still prevent promotion via on_failure's rm -f of
VERSION.pending before the promotion site is reached. Reaching the
promotion site means Phase 7 completed without exit-code or trap —
that's the correct gate.

Bunker Admin
This commit is contained in:
bunker-admin 2026-04-15 18:33:13 -06:00
parent ac901c9e53
commit 13513aeca5

View File

@ -1405,11 +1405,16 @@ PYEOF
fi
# --- Atomic VERSION promotion (Fix B) ---
# The staged VERSION from Phase 3 lands only now, after full verification.
# On any prior failure, on_failure removes VERSION.pending and the live
# VERSION file remains at the pre-upgrade value — so upgrade-check.sh
# correctly reports "upgrade available" on the next check.
if [[ "$VERIFY_FAILED" != "true" ]] && [[ -f "$UPGRADE_DIR/VERSION.pending" ]]; then
# The staged VERSION from Phase 3 lands now that we've reached the end of
# Phase 7 without on_failure firing. Promote regardless of VERIFY_FAILED —
# that flag is a soft health-check warning (e.g. "admin slow to respond"),
# not an upgrade failure. The tarball is extracted, containers are up, and
# write_result below will record success=true. Gating promotion on
# VERIFY_FAILED previously caused a "stuck at old VERSION" bug where a
# transient admin healthcheck warning pinned the install back.
# Hard failures (SIGTERM, exit !=0) still prevent promotion via on_failure,
# which rm -f's VERSION.pending before it can be promoted.
if [[ -f "$UPGRADE_DIR/VERSION.pending" ]]; then
mv "$UPGRADE_DIR/VERSION.pending" "$PROJECT_DIR/VERSION"
success "VERSION promoted to $(head -1 "$PROJECT_DIR/VERSION" 2>/dev/null || echo "?")"
fi