Fix VERSION promotion regression: don't gate on soft health-check warnings
Prior commit (ac901c9e, Fix B) gated VERSION.pending promotion behind VERIFY_FAILED=false, but VERIFY_FAILED is a soft warning signal — it also fires when the admin container's 30s verify budget is tight (which was the cry-wolf case Fix 3 addressed in the same commit). Observed on marcelle during v2.9.5 → v2.9.6: the upgrade completed successfully (tarball extracted, containers pulled and running new image), but because the admin healthcheck warned at 30s (still using v2.9.5's upgrade.sh with its 30s budget), VERIFY_FAILED=true pinned VERSION back to v2.9.5 despite everything else having advanced. result.json showed success=true but newCommit=v2.9.5. Hard failures still prevent promotion via on_failure's rm -f of VERSION.pending before the promotion site is reached. Reaching the promotion site means Phase 7 completed without exit-code or trap — that's the correct gate. Bunker Admin
This commit is contained in:
parent
ac901c9e53
commit
13513aeca5
@ -1405,11 +1405,16 @@ PYEOF
|
||||
fi
|
||||
|
||||
# --- Atomic VERSION promotion (Fix B) ---
|
||||
# The staged VERSION from Phase 3 lands only now, after full verification.
|
||||
# On any prior failure, on_failure removes VERSION.pending and the live
|
||||
# VERSION file remains at the pre-upgrade value — so upgrade-check.sh
|
||||
# correctly reports "upgrade available" on the next check.
|
||||
if [[ "$VERIFY_FAILED" != "true" ]] && [[ -f "$UPGRADE_DIR/VERSION.pending" ]]; then
|
||||
# The staged VERSION from Phase 3 lands now that we've reached the end of
|
||||
# Phase 7 without on_failure firing. Promote regardless of VERIFY_FAILED —
|
||||
# that flag is a soft health-check warning (e.g. "admin slow to respond"),
|
||||
# not an upgrade failure. The tarball is extracted, containers are up, and
|
||||
# write_result below will record success=true. Gating promotion on
|
||||
# VERIFY_FAILED previously caused a "stuck at old VERSION" bug where a
|
||||
# transient admin healthcheck warning pinned the install back.
|
||||
# Hard failures (SIGTERM, exit !=0) still prevent promotion via on_failure,
|
||||
# which rm -f's VERSION.pending before it can be promoted.
|
||||
if [[ -f "$UPGRADE_DIR/VERSION.pending" ]]; then
|
||||
mv "$UPGRADE_DIR/VERSION.pending" "$PROJECT_DIR/VERSION"
|
||||
success "VERSION promoted to $(head -1 "$PROJECT_DIR/VERSION" 2>/dev/null || echo "?")"
|
||||
fi
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user