changemaker.lite/docs/SESSION_HANDOFF_2026-05-21.md
bunker-admin f34382ebdd chore(approach-c): Phase 0 initial template overlay + session handoff
This session shipped:
- Approach B end-to-end (commit 4a3d9d7): full rollout to all 7 tenants;
  marcelle E2E validated twice (121s + 100s).
- v2.10.2 surgical update applied to 6 remaining tenants.

This commit lands the kickoff for Approach C (template re-render path):

scripts/templates changes:
- docker-compose.yml.hbs.OLD-style-pre-approach-c: preserved old CCP
  template (Handlebars-heavy, dynamic container names, secrets rendered
  at template-time).
- docker-compose.yml.hbs: REWRITTEN as a near-mirror of canonical
  docker-compose.prod.yml. Minimal Handlebars overlay:
    - Header comment lists {{name}}, {{slug}}, {{composeProject}}.
    - 5 image refs: ${IMAGE_TAG:-latest} -> {{imageTag}}, so CCP can
      per-instance override once Phase 1 lands the Instance.imageTag column.
  All other variation flows through env-var substitution from tenant's
  .env. Container names are now hardcoded (matching prod), feature flags
  are deferred to COMPOSE_PROFILES gating (matching prod).

Why a rewrite: the old CCP template and prod compose used fundamentally
different conventions (dynamic vs hardcoded names, render-time vs
substitute-time secrets, Handlebars vs profiles gating). Sync-by-addition
couldn't reconcile them. The rewrite makes Approach C re-render safe for
the install.sh-installed fleet (marcelle, linda, pia and future).

docs/SESSION_HANDOFF_2026-05-21.md: full session handoff covering fleet
state, Approach B rollout, Approach C plan, and where to start next
session. force-added because /docs is gitignored (same precedent as
docs/SESSION_HANDOFF_2026-05-20.md from prior session).

Phase 0 remaining work (next session):
- Audit env.hbs against new compose env-var expectations
- Sync static config files (nginx/, configs/prometheus/, etc.)
- Build api/scripts/render-for-instance.ts harness
- Iterate template until rendered output is per-instance-only diff
  against marcelle/linda/pia actual compose.

Then Phases 1-6 per plan in subsequent sessions (~11-14 hours total).

Bunker Admin
2026-05-21 19:32:21 -06:00

170 lines
9.4 KiB
Markdown

# Session Handoff: Approach B Rollout + Approach C Planning (2026-05-21)
Carries forward all context from a long working session. If you're a fresh agent: read this top-to-bottom before touching anything.
---
## What landed in this session (commits on origin/main)
| Commit | Description |
|---|---|
| `4a3d9d7` | `feat(upgrade): Approach B - image-only upgrade mode` — 7 files, 666 insertions. scripts/image-upgrade.sh + CCP agent endpoint + CCP backend (driver/service/route/schema) + admin UI "Quick Upgrade" button. |
| `<this commit>` | docs: session handoff + Approach C Phase 0 initial template overlay |
Plus several non-tracked deploys:
- v2.10.2 surgical update applied to remaining 6 tenants (soroush, linda, marcelle, bnkops, trbh, pridecorner — pia was done previously). All verified mkdocs untouched, upgrade.sh sha matches `b9f37d59...`.
- Fleet rollout of Approach B: new `image-upgrade.sh` script delivered + new `ccp-agent` image (with `/upgrade/start-image-only` endpoint) deployed to all 7 tenants. Bnkops's ccp-agent was rebuilt from source (builds locally rather than pulled from registry).
---
## Fleet state at session end
| Tenant | Surgical update v2.10.2 | image-upgrade.sh | New ccp-agent with image-only endpoint |
|---|---|---|---|
| pia | ✅ (prior session) | ✅ | ✅ |
| soroush | ✅ | ✅ | ✅ |
| linda | ✅ | ✅ | ✅ |
| marcelle | ✅ + tested both A and B E2E | ✅ | ✅ |
| bnkops | ✅ | ✅ | ✅ (rebuilt locally) |
| trbh | ✅ | ✅ | ✅ |
| pridecorner | ✅ | ✅ | ✅ |
Marcelle E2E test results:
- **Approach A (full upgrade)**: v2.10.1 → v2.10.2 in 250s, COMPLETED, no SIGKILL on script. Phase 6 deferred ccp-agent restart fix worked end-to-end through CCP path.
- **Approach B (Quick Upgrade) run 1**: 121s, COMPLETED, mkdocs.yml md5 unchanged.
- **Approach B (Quick Upgrade) run 2**: 100s (cached pull), COMPLETED, mkdocs unchanged again — confirms idempotency.
---
## Fleet backup (Phase 0 work — defensive)
All 7 tenants backed up to `/media/bunker-admin/BACKUP/fleet/<node>/2026-05-21-pre-v2.10.2/`:
| Node | Tenant | Size |
|---|---|---|
| n1 | pridecorner | 182MB (includes 3 stash patches from March 9) |
| n2 | linda | 26MB |
| n3 | pia | 45MB (post-surgical state) |
| n4 | bnkops | 4.4GB (huge — 2277 mkdocs/docs files) |
| n5 | marcelle | 28MB |
| n6 | trbh | 336MB |
| n7 | soroush | 76MB |
Each tenant dir has `mkdocs.tar.gz`, `configs-and-nginx.tar.gz`, `config-files.tar.gz`, `host-state.txt`, `git-state.txt` (source installs only), and `MANIFEST.txt`.
---
## Approach C planning + initial overlay
**Decision: rewrite `docker-compose.yml.hbs` in prod-compose style** to make CCP-driven template re-render safe for the install.sh fleet.
### Why a rewrite (not sync-by-addition)
Discovered the CCP template and `docker-compose.prod.yml` use fundamentally different conventions:
| | Old template (`.hbs`) | Canonical prod |
|---|---|---|
| Container names | `{{containerPrefix}}-postgres` (dynamic) | `changemaker-v2-postgres` (hardcoded) |
| Secrets | `{{secrets.postgresPassword}}` (Handlebars-rendered) | `${POSTGRES_PASSWORD}` (env-substituted) |
| Optional services | `{{#if enableX}}` blocks | Always-defined, gated via `COMPOSE_PROFILES` |
| Ports | `{{ports.api}}` | Hardcoded |
Sync-by-additions can't reconcile these. Rewrite is cleaner long-term.
### Initial overlay committed this session
`changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c` — preserved old template for reference.
`changemaker-control-panel/templates/docker-compose.yml.hbs` — now a near-mirror of `changemaker.lite/docker-compose.prod.yml` (1493 lines + Handlebars header):
- Header comment includes `{{name}}`, `{{slug}}`, `{{composeProject}}` for traceability.
- 5 image refs replaced `${IMAGE_TAG:-latest}``{{imageTag}}` so CCP can per-instance override via `Instance.imageTag` once Phase 1 lands.
- All other variation flows through env-var substitution from tenant's `.env`.
### Remaining Approach C work (next session)
See `/home/bunker-admin/.claude/plans/insight-temporal-bachman.md` for the full plan. Quick summary of what's next:
**Phase 0 completion (next session):**
- Audit `env.hbs` against the new compose's expected env vars. Add missing.
- Sync static config files in `templates/`: nginx/, configs/prometheus/, configs/alertmanager/, configs/grafana/. They may have drifted too.
- Write a one-off render harness (`api/scripts/render-for-instance.ts`) that loads an instance row, builds context, renders templates to scratch dir.
- Render against marcelle, linda, pia. Diff against their actual files. Iterate the template until diff is per-instance values only (`COMPOSE_PROJECT_NAME`, ports, secrets — not structure).
**Phase 1 (~30 min):** Add `Instance.imageTag` Prisma column + migration. Modify `template-engine.ts:211` to use `instance.imageTag || env.IMAGE_TAG`.
**Phase 2 (~3-4 hr):** Pre-flight diff endpoint. New agent route `POST /instance/:slug/files/diff` + `RemoteDriver.diffFiles()` + `LocalDriver.diffFiles()` + `previewReleaseUpgrade()` in upgrade.service. Includes `envCoverage` check for registered tenants.
**Phase 3 (~3-4 hr):** `startReleaseUpgrade()` + `runReleaseUpgrade()` in upgrade.service. Split logic for `isRegistered=true` (skip env render) vs `isRegistered=false` (render env).
**Phase 4 (~30 min):** CCP routes `/upgrade-release` + `/upgrade-release/preview` + Zod schema.
**Phase 5 (~2-3 hr):** "Upgrade to Release" UI button + preview modal + env-coverage warning.
**Phase 6 (~1 hr):** Tag v2.10.3 in changemaker.lite, push images with tag, trigger upgrade-release on marcelle via CCP UI, verify mkdocs untouched + containers on new tag.
**Total remaining: 11-14 hours.** Recommended split:
- Session 2: complete Phase 0 (render harness + iterate template + env.hbs sync + static file syncs). ~half day.
- Session 3: Phases 1-5. ~half day.
- Session 4: Phase 6 E2E test. ~1 hour.
---
## Critical files for Approach C
**Already modified this session:**
- `changemaker-control-panel/templates/docker-compose.yml.hbs` — overlay from prod compose with minimal Handlebars markup.
- `changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c` — preserved old template.
**To be modified in next sessions (per plan):**
- `changemaker-control-panel/templates/env.hbs` (Phase 0 audit)
- `changemaker-control-panel/templates/configs/**` (Phase 0 syncs)
- `changemaker-control-panel/api/prisma/schema.prisma` (Phase 1)
- `changemaker-control-panel/api/prisma/migrations/<ts>_add_instance_image_tag/` (Phase 1)
- `changemaker-control-panel/api/src/services/template-engine.ts` line 211 (Phase 1)
- `changemaker-control-panel/api/src/services/upgrade.service.ts` (Phases 2-3)
- `changemaker-control-panel/api/src/services/remote-driver.ts` + `local-driver.ts` + `execution-driver.ts` (Phase 2)
- `changemaker-control-panel/agent/src/routes/files.routes.ts` + `services/file.service.ts` (Phase 2)
- `changemaker-control-panel/api/src/modules/instances/instances.routes.ts` + `instances.schemas.ts` (Phase 4)
- `changemaker-control-panel/admin/src/pages/InstanceDetailPage.tsx` (Phase 5)
---
## Memory key gotchas (write to MEMORY.md next session)
1. **CCP template vs prod compose: were divergent, now aligned.** As of this session, `templates/docker-compose.yml.hbs` is structurally a near-mirror of `docker-compose.prod.yml`. Going forward, any new service in prod compose must be ported into the template manually (or via a future CI drift check).
2. **bnkops's ccp-agent is locally built**, not pulled from registry. Has a `build:` directive in compose. The other 6 tenants pull `gitea.bnkops.com/admin/changemaker-ccp-agent:latest`.
3. **install.sh tenants (`isRegistered=true`)** lack `encryptedSecrets` in CCP DB. Approach C must skip `env.hbs` rendering for them — they keep their tarball-provisioned `.env`. The pre-flight envCoverage check is the safety net.
4. **n4 SSH lacks marcelle's host key by default** — first `ssh n4 → marcelle` connection needs `StrictHostKeyChecking=accept-new` or interactive accept. Other tenants in the lab have the same pattern.
5. **`docker save | ssh ... docker load` is the registry-less image distribution path** when n4 doesn't have docker login to gitea.bnkops.com. Worked well for the ccp-agent rollout this session.
6. **`set -o pipefail` + `grep -q` shorts the pipeline** because grep closes the pipe early on first match, sending SIGPIPE to the writer. Solution: capture upstream output into a variable, then grep against the variable. (Bug found + fixed in `scripts/image-upgrade.sh` during this session.)
---
## CCP access (unchanged)
```
URL: http://n4-bnkops.taile33572.ts.net:5100 (UI)
http://n4-bnkops.taile33572.ts.net:5000 (API)
User: admin@thebunkerops.ca
Password: NRTgHdC7Zxxs2P2UmNwnEbn3jTwU8uJN (seed)
Role: SUPER_ADMIN
```
---
## Where to start next session
Recommended:
1. **Read this doc + `/home/bunker-admin/.claude/plans/insight-temporal-bachman.md` (Approach C plan)** first.
2. **Phase 0 completion:** finish the template rewrite. Build a render harness (`api/scripts/render-for-instance.ts`), render against marcelle/linda/pia, iterate until structural-clean.
3. Commit Phase 0 as standalone PR with rendered-vs-actual diffs in description.
4. Move to Phases 1-5 in a second commit/PR.
5. Phase 6 manual E2E.
Approach B is in production-ready state across the fleet. Approach C is the longer-term path for releases that change orchestration.