changemaker.lite/docs/SESSION_HANDOFF_2026-05-20.md
bunker-admin 731e70ee42 docs: session handoff for the upgrade-flow redesign work
Captures the full state of the 2026-05-20/21 working session for the
next agent or future-self: fleet status, what landed in v2.10.2,
remaining Phase B + C work from the plan, surgical-update procedures
for the 6 remaining tenants (proven on pia 2026-05-21), bug inventory,
and "don't repeat my mistakes" notes.

Plan reference: /home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md

Force-added because docs/ is gitignored but the handoff needs to be
discoverable in-repo (same pattern as COMPETITIVE_ANALYSIS.md).

Bunker Admin
2026-05-21 13:42:08 -06:00

14 KiB
Raw Blame History

Session Handoff: Upgrade Flow Redesign (2026-05-20 → 2026-05-21)

Carries forward all context from a long working session into the next conversation. If you're a fresh agent: read this top-to-bottom before touching anything.


Quick state of the fleet

Tenant Type Version Agent patched Surgical script update Notes
bnkops (n4) source main @ 1b80e82 pending Management node; CCP backend runs here in parallel
marcelle (n5, cursedknowledge.org) release v2.9.15 pending Test bench; first end-to-end CCP upgrade test ran here (succeeded after manual Phase 6 recovery)
trbh (n6) source main @ 1b80e82 pending mkdocs content RESTORED from stash@{0} — site serves "That Really Blonde Human" correctly
pia (n3, pia-bnkops) release v2.9.10 completed 2026-05-21 First successful surgical update — proof the procedure works
pridecorner (n1) source main @ 1b80e82 pending Has 3 March 9 upgrade-* stashes still on disk (audit done; recovery deferred to another agent)
soroush (n7) source main @ 1b80e82 pending Was earliest-fixed tonight
linda (n2, lindalindsay.org) release-converted v2.9.14 pending Was source-install with broken .git; converted to release mode (VERSION file written)

Public sites verified working at session end: trbh.org, docs.trbh.org, bnkops.com, pridecorner.ca, soroushsamavat.org, publicinterestalberta.org, lindalindsay.org, cursedknowledge.org.

Known caveat: docs.bnkops.com returns HTTP 000 externally (Pangolin tunnel routing issue, pre-existing, NOT caused by this session). bnkops mkdocs container serves correct content locally.


What landed in source (committed + pushed to origin/main)

Commit Description
1b80e82 fix(ccp-agent): whitelist /app/instance for git safe.directory — ccp-agent Dockerfile
e88ac79 fix(ccp-agent): export COMPOSE_PROJECT_NAME so upgrade.sh sees correct project — docker-compose.yml + .prod.yml
9613c3e fix(upgrade): Phase 1 of upgrade-flow redesign (Approach A) — upgrade.sh + scripts/lib/mkdocs-snapshot.sh + scripts/upgrade-stash-cleanup.sh + .gitignore
a7d3dd7 chore(release): ship scripts/lib/ + classify upgrade-stash-cleanup.sh — build-release.sh

Release: v2.10.2 tagged on a7d3dd7, uploaded to Gitea Releases as the new "latest" (/releases/latest returns v2.10.2 — the timestamp issue from earlier in session is fixed via build-release.sh's target_commitish workaround).

Earlier in session: tonight also produced commit a531f9b (ccp-agent missing bash/curl/jq/python3 + writable mount) and v2.10.1 release. v2.10.2 supersedes v2.10.1.


The plan — Approach A (DONE) + B + C (pending)

Full design lives at /home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md.

Approach A — Done

Three fixes to existing scripts/upgrade.sh shipping in v2.10.2:

  1. Phase 6 self-destruct fix — Phase 6's broad docker compose up -d no longer recreates ccp-agent (which would SIGKILL the running script). Instead, ccp-agent restart is deferred to AFTER write_result writes the final result.json, via a detached nohup ... & disown subshell.

  2. mkdocs/ snapshot fallbackscripts/lib/mkdocs-snapshot.sh is sourced by upgrade.sh's Phase 2. Before any other backup or pull operation, it tarballs the entire mkdocs/ directory into mkdocs-backup-<timestamp>.tar.gz in the install root. Retains last 5. Discoverable via ls. Restoration is one-liner:

    tar xzf "$(ls -t mkdocs-backup-*.tar.gz | head -1)" -C . && \
    docker compose restart mkdocs mkdocs-site-server
    
  3. upgrade-stash-cleanup.sh — interactive utility to drop accumulated upgrade-* git stashes. Warns LOUDLY if any stash contains mkdocs/mkdocs.yml so operators verify recovery before dropping.

Approach B — Pending (1-2 days)

Add --image-only upgrade mode. Production images are hermetic (bake compiled code + Prisma migrations + entrypoint runs migrations on container start). Therefore docker compose pull && docker compose up -d IS a complete code+schema upgrade. No filesystem mutation outside Docker → tenant content implicitly safe.

New files to create:

  • scripts/image-upgrade.sh (~150 lines; sources scripts/lib/mkdocs-snapshot.sh for the fallback)
  • changemaker-control-panel/agent/src/routes/upgrade.routes.ts → new endpoint POST /instance/:slug/upgrade/start-image-only
  • changemaker-control-panel/api/src/services/upgrade.service.tsstartImageUpgrade(instanceId, userId, { imageTag })
  • changemaker-control-panel/api/src/services/remote-driver.tsstartImageUpgrade()
  • changemaker-control-panel/api/src/modules/instances/instances.routes.tsPOST /:id/upgrade-images
  • CCP admin UI: "Quick Upgrade (image-only)" button on InstanceDetailPage.tsx

Approach C — Pending (3-5 days)

CCP-driven template re-render for orchestration-changing upgrades. Reuses existing template-engine.ts and reconfigureInstance pattern. Only writes templated files (compose, nginx, configs/pangolin); never touches mkdocs/ or configs/code-server/data/. See plan for details.


How to apply v2.10.2 fixes to remaining tenants

For PIA: already done — used as the proof-of-concept on 2026-05-21. mkdocs.yml md5 unchanged, file count unchanged. ~5 minutes per tenant.

For the other 6 tenants, use the surgical update — DO NOT run a raw git pull origin main (it would resurrect tenant-deleted files via merge logic):

Source installs (bnkops, trbh, pridecorner, soroush)

# bnkops, trbh, soroush use ~/changemaker.lite
# pridecorner uses ~/cmlite/changemaker.lite
cd ~/changemaker.lite  # or ~/cmlite/changemaker.lite

git fetch origin main

mkdir -p scripts/lib
git checkout origin/main -- \
  scripts/upgrade.sh \
  scripts/upgrade-stash-cleanup.sh \
  scripts/lib/mkdocs-snapshot.sh \
  scripts/build-release.sh \
  docker-compose.yml \
  .gitignore

# Sanity: tenant content should still be ahead/divergent (not touched)
git status mkdocs/ configs/  # should show no NEW changes from this update

Release installs (marcelle, linda) — used pia approach

# marcelle: ~/changemaker.lite, ssh bunker-admin@100.90.78.47
# linda: ~/changemaker.lite.canonical, ssh bunker-admin@n2-linda.taile33572.ts.net
cd ~/changemaker.lite  # or ~/changemaker.lite.canonical

curl -fSL https://gitea.bnkops.com/admin/changemaker.lite/releases/download/v2.10.2/changemaker-lite-v2.10.2.tar.gz \
  -o /tmp/v2.10.2.tar.gz

mkdir -p scripts/lib
tar -xzf /tmp/v2.10.2.tar.gz --strip-components=1 \
  changemaker-lite/scripts/upgrade.sh \
  changemaker-lite/scripts/upgrade-stash-cleanup.sh \
  changemaker-lite/scripts/lib/mkdocs-snapshot.sh \
  changemaker-lite/docker-compose.yml

chmod +x scripts/upgrade.sh scripts/upgrade-stash-cleanup.sh scripts/lib/mkdocs-snapshot.sh
rm -f /tmp/v2.10.2.tar.gz

# Do NOT update VERSION — only scripts changed, rest of install stays at current version.

Verification per tenant

# Before update: capture
md5sum mkdocs/mkdocs.yml
find mkdocs/docs -type f | wc -l

# Run the appropriate surgical update above

# After update: re-verify (should match)
md5sum mkdocs/mkdocs.yml  
find mkdocs/docs -type f | wc -l

# Confirm new upgrade.sh
grep -c 'deferred ccp-agent\|Deferred ccp-agent' scripts/upgrade.sh  # expect 2

# Optional: smoke-test the snapshot helper
PROJECT_DIR=$(pwd) bash -c '. scripts/lib/mkdocs-snapshot.sh; snapshot_mkdocs'
ls -lh mkdocs-backup-*.tar.gz

Bug inventory — what we know

Fixed in v2.10.2

Bug Memory file Status
Gitea release created_unix=0 (lightweight tag + Gitea 1.23.x quirk) feedback_gitea_release_tag_timing.md Fixed in build-release.sh — uses target_commitish + removes remote tag first
ccp-agent image missing bash/curl/jq/python3 + git safe.directory feedback_ccp_agent_image_deps.md Fixed in agent Dockerfile + rolled out to all 7 tenants
ccp-agent compose mount was :ro (blocked status.json writes) (in feedback_ccp_agent_image_deps.md) Fixed in both compose files
CCP upgrade Phase 5 collision: COMPOSE_PROJECT_NAME mismatch feedback_upgrade_compose_project_name.md Fixed via env-var addition in compose env block (e88ac79) — also needs .env entry on tenants installed before v2.10.2
upgrade.sh Phase 6 self-destruct feedback_upgrade_sh_bugs.md Fixed in v2.10.2 — deferred ccp-agent restart

Open

  • upgrade.sh git stash → git pull stash-no-pop — Pride Corner has 3 stashes from March 9 holding mkdocs.yml customizations. Existing save_user_paths/restore_user_paths in upgrade.sh handles the common case; the snapshot fallback (v2.10.2) covers edge cases. Pridecorner-specific recovery handled by another agent.
  • Agent-side detached: true spawn — Defense-in-depth. Skip unless Phase 6 self-destruct re-emerges.

Tenant content protection layers (all in v2.10.2)

  1. save_user_paths/restore_user_paths in upgrade.sh — preserves working-tree state of mkdocs/docs/, mkdocs/mkdocs.yml, mkdocs/site/, configs/, nginx/conf.d/services.conf across git pull.
  2. git stash + auto-resolve on USER_PATHS — modified tracked files stash + pop with git checkout --theirs on USER_PATH conflicts.
  3. Pre-upgrade mkdocs snapshot — tarball of mkdocs/ to install root before any other phase runs. Fallback for everything else.

Tonight's recovery work — already applied

These tenants had content damage from earlier in the session; recovery was completed:

  • trbh — mkdocs.yml + 143 M files restored from stash@{0}; 538 D-entry files re-deleted. Public sites serve correct branding.
  • bnkops — same pattern, 100 M files restored + 82 D-entry re-deletions. Public sites serve correct branding.
  • marcelle — manual recovery from Phase 6 self-destruct test (file rollback + service restart). On v2.10.1 currently. Operating normally.

stash@{0} is preserved on trbh and bnkops as forensic record + safety net.


CCP access

URL:       http://n4-bnkops.taile33572.ts.net:5100  (UI)
           http://n4-bnkops.taile33572.ts.net:5000  (API)
User:      admin@thebunkerops.ca
Password:  NRTgHdC7Zxxs2P2UmNwnEbn3jTwU8uJN  (seed; rotate if you want)
Role:      SUPER_ADMIN

Test bench (marcelle)

SSH:           ssh bunker-admin@100.90.78.47
Install dir:   ~/changemaker.lite
Domain:        cursedknowledge.org
Admin:         admin@cursedknowledge.org / @TheBunker2025!
CCP slug:      changemakerlite
CCP id:        71b5bc4a-c47e-4435-b460-e9bc303b76ed

Marcelle is the test bench per docs/TEST_SERVER.md. Use it for ALL upgrade experiments before touching production tenants.


Per-tenant quick reference

Tenant SSH Install dir CCP id
bnkops bunker-admin@n4-bnkops.taile33572.ts.net ~/changemaker.lite 21238536-7c04-4a3b-a073-38390a939046
marcelle bunker-admin@100.90.78.47 ~/changemaker.lite 71b5bc4a-c47e-4435-b460-e9bc303b76ed
trbh bunker-admin@n6-trbh.taile33572.ts.net ~/changemaker.lite c066dc23-64a5-4684-96a7-992e65c1b82c
pia pia-bnkops@n3-pia.taile33572.ts.net ~/changemaker.lite 92a11622-d357-4ab4-b21e-60c030c1b026
pridecorner bunker-admin@n1-pridecorner.taile33572.ts.net ~/cmlite/changemaker.lite a30de94b-ef28-42b6-a71d-112669526a62
soroush bunker-admin@n7-soroush.taile33572.ts.net ~/changemaker.lite 0c70f94c-1319-41e1-867c-5674f17cadda
linda bunker-admin@n2-linda.taile33572.ts.net ~/changemaker.lite.canonical 6dcc19a1-f4fd-45df-be77-5bf62f8110c8

Most important "don't repeat my mistakes" notes

  1. Never git stash + git pull --ff-only origin main on a tenant outside of upgrade.sh. The stash silently displaces tenant content. If you must update files on a source-installed tenant, use targeted git checkout origin/main -- <specific-file> instead.

  2. Never blindly trigger CCP "Upgrade Now" on a tenant still running pre-v2.10.2 upgrade.sh — it will Phase 6 self-destruct. Apply surgical script update first (instructions above), THEN trigger CCP upgrade.

  3. mkdocs/docs/ contains upstream tracked files (default screenshots, demo docs, blog posts). Tenants typically delete these locally without committing. ANY operation that brings origin/main's tracked tree into the working tree (git pull, tarball extract) will resurrect them. v2.10.2's snapshot fallback gives you a recovery path; the surgical update procedure (this doc) avoids the issue entirely.

  4. mkdocs/mkdocs.yml is tracked, tenant-customized with branding. Lives under USER_PATHS so v2.10.2's upgrade.sh protects it. But if you do raw git operations outside the script, it's exposed.

  5. CCP backend on n4 is decoupled from per-tenant ccp-agent. Restarting a tenant's ccp-agent does NOT affect CCP itself. Verified during bnkops patch (CCP backend stayed at 41h uptime while ccp-agent recreated).


Memory files (in /home/bunker-admin/.claude/projects/-home-bunker-admin-changemaker-lite/memory/)

Latest session work documented in:

  • feedback_gitea_release_tag_timing.md
  • feedback_ccp_agent_image_deps.md
  • feedback_upgrade_compose_project_name.md
  • feedback_upgrade_sh_bugs.md
  • feedback_session_2026_05_20_damage_report.md

Plus the architectural plan: /home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md


Where to start the next session

Recommended sequence:

  1. Apply surgical update to remaining 6 tenants (~30-45 min, low risk; pia procedure already proven). Order: marcelle, linda (release), then soroush, trbh, bnkops, pridecorner (source).
  2. Test CCP-driven upgrade on marcelle after surgical update lands. This will verify the deferred ccp-agent restart works end-to-end through the CCP path (the test we couldn't complete tonight because Phase 6 kept self-destructing).
  3. Implement Approach B per the plan — image-only upgrade mode. Estimated 1-2 days.
  4. Implement Approach C — CCP template re-render. 3-5 days.

If only one thing happens next session: do step 1. Six surgical updates × ~5 minutes each. The rest of the fleet stays vulnerable to Phase 6 self-destruct until they're on v2.10.2's upgrade.sh.