This session shipped:
- Approach B end-to-end (commit 4a3d9d7): full rollout to all 7 tenants;
marcelle E2E validated twice (121s + 100s).
- v2.10.2 surgical update applied to 6 remaining tenants.
This commit lands the kickoff for Approach C (template re-render path):
scripts/templates changes:
- docker-compose.yml.hbs.OLD-style-pre-approach-c: preserved old CCP
template (Handlebars-heavy, dynamic container names, secrets rendered
at template-time).
- docker-compose.yml.hbs: REWRITTEN as a near-mirror of canonical
docker-compose.prod.yml. Minimal Handlebars overlay:
- Header comment lists {{name}}, {{slug}}, {{composeProject}}.
- 5 image refs: ${IMAGE_TAG:-latest} -> {{imageTag}}, so CCP can
per-instance override once Phase 1 lands the Instance.imageTag column.
All other variation flows through env-var substitution from tenant's
.env. Container names are now hardcoded (matching prod), feature flags
are deferred to COMPOSE_PROFILES gating (matching prod).
Why a rewrite: the old CCP template and prod compose used fundamentally
different conventions (dynamic vs hardcoded names, render-time vs
substitute-time secrets, Handlebars vs profiles gating). Sync-by-addition
couldn't reconcile them. The rewrite makes Approach C re-render safe for
the install.sh-installed fleet (marcelle, linda, pia and future).
docs/SESSION_HANDOFF_2026-05-21.md: full session handoff covering fleet
state, Approach B rollout, Approach C plan, and where to start next
session. force-added because /docs is gitignored (same precedent as
docs/SESSION_HANDOFF_2026-05-20.md from prior session).
Phase 0 remaining work (next session):
- Audit env.hbs against new compose env-var expectations
- Sync static config files (nginx/, configs/prometheus/, etc.)
- Build api/scripts/render-for-instance.ts harness
- Iterate template until rendered output is per-instance-only diff
against marcelle/linda/pia actual compose.
Then Phases 1-6 per plan in subsequent sessions (~11-14 hours total).
Bunker Admin
9.4 KiB
Session Handoff: Approach B Rollout + Approach C Planning (2026-05-21)
Carries forward all context from a long working session. If you're a fresh agent: read this top-to-bottom before touching anything.
What landed in this session (commits on origin/main)
| Commit | Description |
|---|---|
4a3d9d7 |
feat(upgrade): Approach B - image-only upgrade mode — 7 files, 666 insertions. scripts/image-upgrade.sh + CCP agent endpoint + CCP backend (driver/service/route/schema) + admin UI "Quick Upgrade" button. |
<this commit> |
docs: session handoff + Approach C Phase 0 initial template overlay |
Plus several non-tracked deploys:
- v2.10.2 surgical update applied to remaining 6 tenants (soroush, linda, marcelle, bnkops, trbh, pridecorner — pia was done previously). All verified mkdocs untouched, upgrade.sh sha matches
b9f37d59.... - Fleet rollout of Approach B: new
image-upgrade.shscript delivered + newccp-agentimage (with/upgrade/start-image-onlyendpoint) deployed to all 7 tenants. Bnkops's ccp-agent was rebuilt from source (builds locally rather than pulled from registry).
Fleet state at session end
| Tenant | Surgical update v2.10.2 | image-upgrade.sh | New ccp-agent with image-only endpoint |
|---|---|---|---|
| pia | ✅ (prior session) | ✅ | ✅ |
| soroush | ✅ | ✅ | ✅ |
| linda | ✅ | ✅ | ✅ |
| marcelle | ✅ + tested both A and B E2E | ✅ | ✅ |
| bnkops | ✅ | ✅ | ✅ (rebuilt locally) |
| trbh | ✅ | ✅ | ✅ |
| pridecorner | ✅ | ✅ | ✅ |
Marcelle E2E test results:
- Approach A (full upgrade): v2.10.1 → v2.10.2 in 250s, COMPLETED, no SIGKILL on script. Phase 6 deferred ccp-agent restart fix worked end-to-end through CCP path.
- Approach B (Quick Upgrade) run 1: 121s, COMPLETED, mkdocs.yml md5 unchanged.
- Approach B (Quick Upgrade) run 2: 100s (cached pull), COMPLETED, mkdocs unchanged again — confirms idempotency.
Fleet backup (Phase 0 work — defensive)
All 7 tenants backed up to /media/bunker-admin/BACKUP/fleet/<node>/2026-05-21-pre-v2.10.2/:
| Node | Tenant | Size |
|---|---|---|
| n1 | pridecorner | 182MB (includes 3 stash patches from March 9) |
| n2 | linda | 26MB |
| n3 | pia | 45MB (post-surgical state) |
| n4 | bnkops | 4.4GB (huge — 2277 mkdocs/docs files) |
| n5 | marcelle | 28MB |
| n6 | trbh | 336MB |
| n7 | soroush | 76MB |
Each tenant dir has mkdocs.tar.gz, configs-and-nginx.tar.gz, config-files.tar.gz, host-state.txt, git-state.txt (source installs only), and MANIFEST.txt.
Approach C planning + initial overlay
Decision: rewrite docker-compose.yml.hbs in prod-compose style to make CCP-driven template re-render safe for the install.sh fleet.
Why a rewrite (not sync-by-addition)
Discovered the CCP template and docker-compose.prod.yml use fundamentally different conventions:
Old template (.hbs) |
Canonical prod | |
|---|---|---|
| Container names | {{containerPrefix}}-postgres (dynamic) |
changemaker-v2-postgres (hardcoded) |
| Secrets | {{secrets.postgresPassword}} (Handlebars-rendered) |
${POSTGRES_PASSWORD} (env-substituted) |
| Optional services | {{#if enableX}} blocks |
Always-defined, gated via COMPOSE_PROFILES |
| Ports | {{ports.api}} |
Hardcoded |
Sync-by-additions can't reconcile these. Rewrite is cleaner long-term.
Initial overlay committed this session
changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c — preserved old template for reference.
changemaker-control-panel/templates/docker-compose.yml.hbs — now a near-mirror of changemaker.lite/docker-compose.prod.yml (1493 lines + Handlebars header):
- Header comment includes
{{name}},{{slug}},{{composeProject}}for traceability. - 5 image refs replaced
${IMAGE_TAG:-latest}→{{imageTag}}so CCP can per-instance override viaInstance.imageTagonce Phase 1 lands. - All other variation flows through env-var substitution from tenant's
.env.
Remaining Approach C work (next session)
See /home/bunker-admin/.claude/plans/insight-temporal-bachman.md for the full plan. Quick summary of what's next:
Phase 0 completion (next session):
- Audit
env.hbsagainst the new compose's expected env vars. Add missing. - Sync static config files in
templates/: nginx/, configs/prometheus/, configs/alertmanager/, configs/grafana/. They may have drifted too. - Write a one-off render harness (
api/scripts/render-for-instance.ts) that loads an instance row, builds context, renders templates to scratch dir. - Render against marcelle, linda, pia. Diff against their actual files. Iterate the template until diff is per-instance values only (
COMPOSE_PROJECT_NAME, ports, secrets — not structure).
Phase 1 (~30 min): Add Instance.imageTag Prisma column + migration. Modify template-engine.ts:211 to use instance.imageTag || env.IMAGE_TAG.
Phase 2 (~3-4 hr): Pre-flight diff endpoint. New agent route POST /instance/:slug/files/diff + RemoteDriver.diffFiles() + LocalDriver.diffFiles() + previewReleaseUpgrade() in upgrade.service. Includes envCoverage check for registered tenants.
Phase 3 (~3-4 hr): startReleaseUpgrade() + runReleaseUpgrade() in upgrade.service. Split logic for isRegistered=true (skip env render) vs isRegistered=false (render env).
Phase 4 (~30 min): CCP routes /upgrade-release + /upgrade-release/preview + Zod schema.
Phase 5 (~2-3 hr): "Upgrade to Release" UI button + preview modal + env-coverage warning.
Phase 6 (~1 hr): Tag v2.10.3 in changemaker.lite, push images with tag, trigger upgrade-release on marcelle via CCP UI, verify mkdocs untouched + containers on new tag.
Total remaining: 11-14 hours. Recommended split:
- Session 2: complete Phase 0 (render harness + iterate template + env.hbs sync + static file syncs). ~half day.
- Session 3: Phases 1-5. ~half day.
- Session 4: Phase 6 E2E test. ~1 hour.
Critical files for Approach C
Already modified this session:
changemaker-control-panel/templates/docker-compose.yml.hbs— overlay from prod compose with minimal Handlebars markup.changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c— preserved old template.
To be modified in next sessions (per plan):
changemaker-control-panel/templates/env.hbs(Phase 0 audit)changemaker-control-panel/templates/configs/**(Phase 0 syncs)changemaker-control-panel/api/prisma/schema.prisma(Phase 1)changemaker-control-panel/api/prisma/migrations/<ts>_add_instance_image_tag/(Phase 1)changemaker-control-panel/api/src/services/template-engine.tsline 211 (Phase 1)changemaker-control-panel/api/src/services/upgrade.service.ts(Phases 2-3)changemaker-control-panel/api/src/services/remote-driver.ts+local-driver.ts+execution-driver.ts(Phase 2)changemaker-control-panel/agent/src/routes/files.routes.ts+services/file.service.ts(Phase 2)changemaker-control-panel/api/src/modules/instances/instances.routes.ts+instances.schemas.ts(Phase 4)changemaker-control-panel/admin/src/pages/InstanceDetailPage.tsx(Phase 5)
Memory key gotchas (write to MEMORY.md next session)
-
CCP template vs prod compose: were divergent, now aligned. As of this session,
templates/docker-compose.yml.hbsis structurally a near-mirror ofdocker-compose.prod.yml. Going forward, any new service in prod compose must be ported into the template manually (or via a future CI drift check). -
bnkops's ccp-agent is locally built, not pulled from registry. Has a
build:directive in compose. The other 6 tenants pullgitea.bnkops.com/admin/changemaker-ccp-agent:latest. -
install.sh tenants (
isRegistered=true) lackencryptedSecretsin CCP DB. Approach C must skipenv.hbsrendering for them — they keep their tarball-provisioned.env. The pre-flight envCoverage check is the safety net. -
n4 SSH lacks marcelle's host key by default — first
ssh n4 → marcelleconnection needsStrictHostKeyChecking=accept-newor interactive accept. Other tenants in the lab have the same pattern. -
docker save | ssh ... docker loadis the registry-less image distribution path when n4 doesn't have docker login to gitea.bnkops.com. Worked well for the ccp-agent rollout this session. -
set -o pipefail+grep -qshorts the pipeline because grep closes the pipe early on first match, sending SIGPIPE to the writer. Solution: capture upstream output into a variable, then grep against the variable. (Bug found + fixed inscripts/image-upgrade.shduring this session.)
CCP access (unchanged)
URL: http://n4-bnkops.taile33572.ts.net:5100 (UI)
http://n4-bnkops.taile33572.ts.net:5000 (API)
User: admin@thebunkerops.ca
Password: NRTgHdC7Zxxs2P2UmNwnEbn3jTwU8uJN (seed)
Role: SUPER_ADMIN
Where to start next session
Recommended:
- Read this doc +
/home/bunker-admin/.claude/plans/insight-temporal-bachman.md(Approach C plan) first. - Phase 0 completion: finish the template rewrite. Build a render harness (
api/scripts/render-for-instance.ts), render against marcelle/linda/pia, iterate until structural-clean. - Commit Phase 0 as standalone PR with rendered-vs-actual diffs in description.
- Move to Phases 1-5 in a second commit/PR.
- Phase 6 manual E2E.
Approach B is in production-ready state across the fleet. Approach C is the longer-term path for releases that change orchestration.