Adds the third upgrade path alongside Approach A (full upgrade.sh) and B
(image-only). For releases that change orchestration (new services, new
nginx routes, new compose env vars) in addition to image versions, CCP
re-renders templates server-side, sends the rendered files to the tenant
via the existing mTLS agent, then composePull + composeUp. Tenant content
(mkdocs/, custom configs/) is never touched.
Pieces:
PHASE 1 — Schema + per-instance imageTag
- prisma/schema.prisma: new Instance.imageTag column (NULL = fall back
to env.IMAGE_TAG default).
- prisma/migrations/20260522093400_add_instance_image_tag/: SQL.
- services/template-engine.ts:
- buildTemplateContext now uses instance.imageTag || env.IMAGE_TAG.
- InstanceForTemplate interface gains imageTag: string | null.
PHASE 2 — Pre-flight diff (read-only "what would change?")
- agent/services/file.service.ts: new diffFiles() helper with a small
inline LCS-based unified-diff (no new deps). Returns per-file status
('unchanged' | 'modified' | 'created') + truncated unified diff.
- agent/routes/files.routes.ts: POST /instance/:slug/files/diff.
- api/services/execution-driver.ts: diffFiles added to interface.
- api/services/local-driver.ts + remote-driver.ts: diffFiles methods
(local mirrors agent helper inline; remote POSTs to the agent endpoint).
- api/services/upgrade.service.ts: previewReleaseUpgrade() — renders
templates in-memory with the proposed imageTag, filters out .env for
isRegistered=true tenants, calls driver.diffFiles, computes envCoverage
(which env vars the new compose needs vs which the tenant's .env has).
PHASE 3 — Apply path (the actual upgrade)
- api/services/upgrade.service.ts: startReleaseUpgrade() and the inner
runReleaseUpgrade() runner. Distinct from runRemoteUpgrade because CCP
does the work directly via the mTLS driver (no agent-side script).
Flow: persist imageTag in DB → render → writeFiles → composePull →
composeUp → composePs verify. Status reported via InstanceUpgrade
rows (same shape the existing CCP polling UI already uses).
- Failure handling: instance.imageTag stays at the new value on failure
so operator can retry. Manual rollback only.
PHASE 4 — Routes + schemas
- instances.schemas.ts: startReleaseUpgradeSchema (imageTag regex).
- instances.routes.ts:
- POST /:id/upgrade-release (apply)
- POST /:id/upgrade-release/preview (read-only diff)
PHASE 5 — CCP admin UI
- admin/pages/InstanceDetailPage.tsx: third "Upgrade to Release" button
next to Quick Upgrade + Upgrade Now. Opens a modal with imageTag input,
Preview button (calls /preview), and Apply button. Preview modal shows:
- Red alert if envCoverage.missingInTenantEnv is non-empty (compose
needs vars the tenant's .env doesn't define).
- Per-file status tags (unchanged / modified / created) + truncated
unified diff for modified files.
- admin/types/api.ts: Instance.imageTag added.
Constraints applied:
- Remote-only initial scope: throws "currently supported only for remote
instances" if instance.isRemote === false.
- isRegistered=true tenants (install.sh fleet): .env is filtered out
of the render set (CCP can't render env without secrets in DB), the
tenant's existing .env stays as-is. envCoverage warns the operator
if the new compose references env vars their .env doesn't define.
- Shared in-progress guard with Approach A/B (one upgrade at a time).
Per the plan: see ~/.claude/plans/insight-temporal-bachman.md.
All three projects type-check cleanly (api, agent, admin).
Bunker Admin
Add a "Quick Upgrade" path that pulls latest container images and recreates
only the core app services (api, admin, media-api, nginx) without touching
any tracked files. Tenant content (mkdocs/, configs/, scripts/) is implicitly
preserved because the script never writes outside docker.
Faster (~2 min vs ~4-5 min for full upgrade) and structurally safer for
releases that don't change orchestration/templates.
Pieces:
- scripts/image-upgrade.sh: new ~350-line script. Phases: pre-flight +
mkdocs snapshot, image pull, targeted recreate (broad up -d would cascade
on misconfigured infra containers — proven on marcelle), light health
checks, deferred ccp-agent restart. Writes the same progress.json +
result.json schema as upgrade.sh so the CCP poll loop is unchanged.
- agent/src/routes/upgrade.routes.ts: POST /instance/:slug/upgrade/start-image-only.
Same lock + staleness guards as the existing /upgrade/start endpoint.
- api/src/services/remote-driver.ts: RemoteDriver.startImageUpgrade().
- api/src/services/upgrade.service.ts: startImageUpgrade() entry point;
reuses runRemoteUpgrade with mode='image-only' (only the initial agent
call differs — result schema and polling are identical).
- api/src/modules/instances/instances.routes.ts: POST /:id/upgrade-images
+ startImageUpgradeSchema.
- admin/src/pages/InstanceDetailPage.tsx: secondary "Quick Upgrade" button
next to "Upgrade Now" on the Updates tab. Tooltip explains when to use it.
Tested locally on marcelle (v2.10.2 idempotent run): 1m 49s, mkdocs.yml md5
unchanged, file count unchanged, only api/admin/media-api/nginx touched.
Subtle bug found and fixed: `set -o pipefail` + `grep -q` shorts pipe and
SIGPIPEs the writer — captured services list once instead.
Bunker Admin
Problem: the agent polled /poll every 30s while waiting for admin
approval. At 10 req/15min, the 11th poll hit 429 after ~5 min and
every subsequent one also failed — recovery required an agent
restart. A human-paced approval SLA is longer than 5 minutes.
CCP side (agents.routes.ts):
Split the one-size-fits-all agentRegistrationLimiter into two.
/register stays tight (10/15min — invite-code brute force is the
real attack surface). /poll gets a new agentPollLimiter at 180/15min
(one poll per ~5s upper bound), scoped to registrationId+slug so
blast radius is bounded.
Agent side (server.ts):
Replaced fixed 30s setInterval with a self-scheduling setTimeout
loop that backs off exponentially on HTTP 429 (30s → 60s → 120s →
300s cap) and resets to 30s on any 2xx. Stop-flag protects against
re-entry after approval. Fixes the "agent wedged at 429, restart to
recover" workaround.
Bunker Admin