changemaker.lite/docs/SESSION_HANDOFF_2026-05-21.md
bunker-admin f34382ebdd chore(approach-c): Phase 0 initial template overlay + session handoff
This session shipped:
- Approach B end-to-end (commit 4a3d9d7): full rollout to all 7 tenants;
  marcelle E2E validated twice (121s + 100s).
- v2.10.2 surgical update applied to 6 remaining tenants.

This commit lands the kickoff for Approach C (template re-render path):

scripts/templates changes:
- docker-compose.yml.hbs.OLD-style-pre-approach-c: preserved old CCP
  template (Handlebars-heavy, dynamic container names, secrets rendered
  at template-time).
- docker-compose.yml.hbs: REWRITTEN as a near-mirror of canonical
  docker-compose.prod.yml. Minimal Handlebars overlay:
    - Header comment lists {{name}}, {{slug}}, {{composeProject}}.
    - 5 image refs: ${IMAGE_TAG:-latest} -> {{imageTag}}, so CCP can
      per-instance override once Phase 1 lands the Instance.imageTag column.
  All other variation flows through env-var substitution from tenant's
  .env. Container names are now hardcoded (matching prod), feature flags
  are deferred to COMPOSE_PROFILES gating (matching prod).

Why a rewrite: the old CCP template and prod compose used fundamentally
different conventions (dynamic vs hardcoded names, render-time vs
substitute-time secrets, Handlebars vs profiles gating). Sync-by-addition
couldn't reconcile them. The rewrite makes Approach C re-render safe for
the install.sh-installed fleet (marcelle, linda, pia and future).

docs/SESSION_HANDOFF_2026-05-21.md: full session handoff covering fleet
state, Approach B rollout, Approach C plan, and where to start next
session. force-added because /docs is gitignored (same precedent as
docs/SESSION_HANDOFF_2026-05-20.md from prior session).

Phase 0 remaining work (next session):
- Audit env.hbs against new compose env-var expectations
- Sync static config files (nginx/, configs/prometheus/, etc.)
- Build api/scripts/render-for-instance.ts harness
- Iterate template until rendered output is per-instance-only diff
  against marcelle/linda/pia actual compose.

Then Phases 1-6 per plan in subsequent sessions (~11-14 hours total).

Bunker Admin
2026-05-21 19:32:21 -06:00

9.4 KiB

Session Handoff: Approach B Rollout + Approach C Planning (2026-05-21)

Carries forward all context from a long working session. If you're a fresh agent: read this top-to-bottom before touching anything.


What landed in this session (commits on origin/main)

Commit Description
4a3d9d7 feat(upgrade): Approach B - image-only upgrade mode — 7 files, 666 insertions. scripts/image-upgrade.sh + CCP agent endpoint + CCP backend (driver/service/route/schema) + admin UI "Quick Upgrade" button.
<this commit> docs: session handoff + Approach C Phase 0 initial template overlay

Plus several non-tracked deploys:

  • v2.10.2 surgical update applied to remaining 6 tenants (soroush, linda, marcelle, bnkops, trbh, pridecorner — pia was done previously). All verified mkdocs untouched, upgrade.sh sha matches b9f37d59....
  • Fleet rollout of Approach B: new image-upgrade.sh script delivered + new ccp-agent image (with /upgrade/start-image-only endpoint) deployed to all 7 tenants. Bnkops's ccp-agent was rebuilt from source (builds locally rather than pulled from registry).

Fleet state at session end

Tenant Surgical update v2.10.2 image-upgrade.sh New ccp-agent with image-only endpoint
pia (prior session)
soroush
linda
marcelle + tested both A and B E2E
bnkops (rebuilt locally)
trbh
pridecorner

Marcelle E2E test results:

  • Approach A (full upgrade): v2.10.1 → v2.10.2 in 250s, COMPLETED, no SIGKILL on script. Phase 6 deferred ccp-agent restart fix worked end-to-end through CCP path.
  • Approach B (Quick Upgrade) run 1: 121s, COMPLETED, mkdocs.yml md5 unchanged.
  • Approach B (Quick Upgrade) run 2: 100s (cached pull), COMPLETED, mkdocs unchanged again — confirms idempotency.

Fleet backup (Phase 0 work — defensive)

All 7 tenants backed up to /media/bunker-admin/BACKUP/fleet/<node>/2026-05-21-pre-v2.10.2/:

Node Tenant Size
n1 pridecorner 182MB (includes 3 stash patches from March 9)
n2 linda 26MB
n3 pia 45MB (post-surgical state)
n4 bnkops 4.4GB (huge — 2277 mkdocs/docs files)
n5 marcelle 28MB
n6 trbh 336MB
n7 soroush 76MB

Each tenant dir has mkdocs.tar.gz, configs-and-nginx.tar.gz, config-files.tar.gz, host-state.txt, git-state.txt (source installs only), and MANIFEST.txt.


Approach C planning + initial overlay

Decision: rewrite docker-compose.yml.hbs in prod-compose style to make CCP-driven template re-render safe for the install.sh fleet.

Why a rewrite (not sync-by-addition)

Discovered the CCP template and docker-compose.prod.yml use fundamentally different conventions:

Old template (.hbs) Canonical prod
Container names {{containerPrefix}}-postgres (dynamic) changemaker-v2-postgres (hardcoded)
Secrets {{secrets.postgresPassword}} (Handlebars-rendered) ${POSTGRES_PASSWORD} (env-substituted)
Optional services {{#if enableX}} blocks Always-defined, gated via COMPOSE_PROFILES
Ports {{ports.api}} Hardcoded

Sync-by-additions can't reconcile these. Rewrite is cleaner long-term.

Initial overlay committed this session

changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c — preserved old template for reference.

changemaker-control-panel/templates/docker-compose.yml.hbs — now a near-mirror of changemaker.lite/docker-compose.prod.yml (1493 lines + Handlebars header):

  • Header comment includes {{name}}, {{slug}}, {{composeProject}} for traceability.
  • 5 image refs replaced ${IMAGE_TAG:-latest}{{imageTag}} so CCP can per-instance override via Instance.imageTag once Phase 1 lands.
  • All other variation flows through env-var substitution from tenant's .env.

Remaining Approach C work (next session)

See /home/bunker-admin/.claude/plans/insight-temporal-bachman.md for the full plan. Quick summary of what's next:

Phase 0 completion (next session):

  • Audit env.hbs against the new compose's expected env vars. Add missing.
  • Sync static config files in templates/: nginx/, configs/prometheus/, configs/alertmanager/, configs/grafana/. They may have drifted too.
  • Write a one-off render harness (api/scripts/render-for-instance.ts) that loads an instance row, builds context, renders templates to scratch dir.
  • Render against marcelle, linda, pia. Diff against their actual files. Iterate the template until diff is per-instance values only (COMPOSE_PROJECT_NAME, ports, secrets — not structure).

Phase 1 (~30 min): Add Instance.imageTag Prisma column + migration. Modify template-engine.ts:211 to use instance.imageTag || env.IMAGE_TAG.

Phase 2 (~3-4 hr): Pre-flight diff endpoint. New agent route POST /instance/:slug/files/diff + RemoteDriver.diffFiles() + LocalDriver.diffFiles() + previewReleaseUpgrade() in upgrade.service. Includes envCoverage check for registered tenants.

Phase 3 (~3-4 hr): startReleaseUpgrade() + runReleaseUpgrade() in upgrade.service. Split logic for isRegistered=true (skip env render) vs isRegistered=false (render env).

Phase 4 (~30 min): CCP routes /upgrade-release + /upgrade-release/preview + Zod schema.

Phase 5 (~2-3 hr): "Upgrade to Release" UI button + preview modal + env-coverage warning.

Phase 6 (~1 hr): Tag v2.10.3 in changemaker.lite, push images with tag, trigger upgrade-release on marcelle via CCP UI, verify mkdocs untouched + containers on new tag.

Total remaining: 11-14 hours. Recommended split:

  • Session 2: complete Phase 0 (render harness + iterate template + env.hbs sync + static file syncs). ~half day.
  • Session 3: Phases 1-5. ~half day.
  • Session 4: Phase 6 E2E test. ~1 hour.

Critical files for Approach C

Already modified this session:

  • changemaker-control-panel/templates/docker-compose.yml.hbs — overlay from prod compose with minimal Handlebars markup.
  • changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c — preserved old template.

To be modified in next sessions (per plan):

  • changemaker-control-panel/templates/env.hbs (Phase 0 audit)
  • changemaker-control-panel/templates/configs/** (Phase 0 syncs)
  • changemaker-control-panel/api/prisma/schema.prisma (Phase 1)
  • changemaker-control-panel/api/prisma/migrations/<ts>_add_instance_image_tag/ (Phase 1)
  • changemaker-control-panel/api/src/services/template-engine.ts line 211 (Phase 1)
  • changemaker-control-panel/api/src/services/upgrade.service.ts (Phases 2-3)
  • changemaker-control-panel/api/src/services/remote-driver.ts + local-driver.ts + execution-driver.ts (Phase 2)
  • changemaker-control-panel/agent/src/routes/files.routes.ts + services/file.service.ts (Phase 2)
  • changemaker-control-panel/api/src/modules/instances/instances.routes.ts + instances.schemas.ts (Phase 4)
  • changemaker-control-panel/admin/src/pages/InstanceDetailPage.tsx (Phase 5)

Memory key gotchas (write to MEMORY.md next session)

  1. CCP template vs prod compose: were divergent, now aligned. As of this session, templates/docker-compose.yml.hbs is structurally a near-mirror of docker-compose.prod.yml. Going forward, any new service in prod compose must be ported into the template manually (or via a future CI drift check).

  2. bnkops's ccp-agent is locally built, not pulled from registry. Has a build: directive in compose. The other 6 tenants pull gitea.bnkops.com/admin/changemaker-ccp-agent:latest.

  3. install.sh tenants (isRegistered=true) lack encryptedSecrets in CCP DB. Approach C must skip env.hbs rendering for them — they keep their tarball-provisioned .env. The pre-flight envCoverage check is the safety net.

  4. n4 SSH lacks marcelle's host key by default — first ssh n4 → marcelle connection needs StrictHostKeyChecking=accept-new or interactive accept. Other tenants in the lab have the same pattern.

  5. docker save | ssh ... docker load is the registry-less image distribution path when n4 doesn't have docker login to gitea.bnkops.com. Worked well for the ccp-agent rollout this session.

  6. set -o pipefail + grep -q shorts the pipeline because grep closes the pipe early on first match, sending SIGPIPE to the writer. Solution: capture upstream output into a variable, then grep against the variable. (Bug found + fixed in scripts/image-upgrade.sh during this session.)


CCP access (unchanged)

URL:       http://n4-bnkops.taile33572.ts.net:5100  (UI)
           http://n4-bnkops.taile33572.ts.net:5000  (API)
User:      admin@thebunkerops.ca
Password:  NRTgHdC7Zxxs2P2UmNwnEbn3jTwU8uJN  (seed)
Role:      SUPER_ADMIN

Where to start next session

Recommended:

  1. Read this doc + /home/bunker-admin/.claude/plans/insight-temporal-bachman.md (Approach C plan) first.
  2. Phase 0 completion: finish the template rewrite. Build a render harness (api/scripts/render-for-instance.ts), render against marcelle/linda/pia, iterate until structural-clean.
  3. Commit Phase 0 as standalone PR with rendered-vs-actual diffs in description.
  4. Move to Phases 1-5 in a second commit/PR.
  5. Phase 6 manual E2E.

Approach B is in production-ready state across the fleet. Approach C is the longer-term path for releases that change orchestration.