Adds the third upgrade path alongside Approach A (full upgrade.sh) and B
(image-only). For releases that change orchestration (new services, new
nginx routes, new compose env vars) in addition to image versions, CCP
re-renders templates server-side, sends the rendered files to the tenant
via the existing mTLS agent, then composePull + composeUp. Tenant content
(mkdocs/, custom configs/) is never touched.
Pieces:
PHASE 1 — Schema + per-instance imageTag
- prisma/schema.prisma: new Instance.imageTag column (NULL = fall back
to env.IMAGE_TAG default).
- prisma/migrations/20260522093400_add_instance_image_tag/: SQL.
- services/template-engine.ts:
- buildTemplateContext now uses instance.imageTag || env.IMAGE_TAG.
- InstanceForTemplate interface gains imageTag: string | null.
PHASE 2 — Pre-flight diff (read-only "what would change?")
- agent/services/file.service.ts: new diffFiles() helper with a small
inline LCS-based unified-diff (no new deps). Returns per-file status
('unchanged' | 'modified' | 'created') + truncated unified diff.
- agent/routes/files.routes.ts: POST /instance/:slug/files/diff.
- api/services/execution-driver.ts: diffFiles added to interface.
- api/services/local-driver.ts + remote-driver.ts: diffFiles methods
(local mirrors agent helper inline; remote POSTs to the agent endpoint).
- api/services/upgrade.service.ts: previewReleaseUpgrade() — renders
templates in-memory with the proposed imageTag, filters out .env for
isRegistered=true tenants, calls driver.diffFiles, computes envCoverage
(which env vars the new compose needs vs which the tenant's .env has).
PHASE 3 — Apply path (the actual upgrade)
- api/services/upgrade.service.ts: startReleaseUpgrade() and the inner
runReleaseUpgrade() runner. Distinct from runRemoteUpgrade because CCP
does the work directly via the mTLS driver (no agent-side script).
Flow: persist imageTag in DB → render → writeFiles → composePull →
composeUp → composePs verify. Status reported via InstanceUpgrade
rows (same shape the existing CCP polling UI already uses).
- Failure handling: instance.imageTag stays at the new value on failure
so operator can retry. Manual rollback only.
PHASE 4 — Routes + schemas
- instances.schemas.ts: startReleaseUpgradeSchema (imageTag regex).
- instances.routes.ts:
- POST /:id/upgrade-release (apply)
- POST /:id/upgrade-release/preview (read-only diff)
PHASE 5 — CCP admin UI
- admin/pages/InstanceDetailPage.tsx: third "Upgrade to Release" button
next to Quick Upgrade + Upgrade Now. Opens a modal with imageTag input,
Preview button (calls /preview), and Apply button. Preview modal shows:
- Red alert if envCoverage.missingInTenantEnv is non-empty (compose
needs vars the tenant's .env doesn't define).
- Per-file status tags (unchanged / modified / created) + truncated
unified diff for modified files.
- admin/types/api.ts: Instance.imageTag added.
Constraints applied:
- Remote-only initial scope: throws "currently supported only for remote
instances" if instance.isRemote === false.
- isRegistered=true tenants (install.sh fleet): .env is filtered out
of the render set (CCP can't render env without secrets in DB), the
tenant's existing .env stays as-is. envCoverage warns the operator
if the new compose references env vars their .env doesn't define.
- Shared in-progress guard with Approach A/B (one upgrade at a time).
Per the plan: see ~/.claude/plans/insight-temporal-bachman.md.
All three projects type-check cleanly (api, agent, admin).
Bunker Admin
This commit completes Phase 0 of Approach C: the CCP template/env/static
files now produce output structurally byte-identical to canonical
docker-compose.prod.yml + .env.example. Verified by rendering against
marcelle, linda, and pia and diffing against their actual files — all
three show only the 30-line CCP-tenant header comment differing,
zero service/env-var structural differences.
Changes:
- templates/docker-compose.yml.hbs: reverted {{imageTag}} substitutions
back to ${IMAGE_TAG:-latest} so the compose template is now byte-
equivalent to docker-compose.prod.yml (modulo header). CCP controls
per-instance image tag selection via the rendered .env's IMAGE_TAG,
which compose-up picks up at runtime. This single-source-of-truth
via env-substitution matches install.sh tenants exactly.
- templates/env.hbs: rewritten as a near-mirror of .env.example. Adds
27 missing keys (IMAGE_TAG, GITEA_REGISTRY, COMPOSE_PROFILES,
ENABLE_CCP_AGENT, GITEA_ADMIN_*, ENABLE_HLS_TRANSCODE, TZ, etc.)
plus 15 CCP-specific extras (embed ports, dev-mode helpers, etc.).
All 145 compose-template env-var references are now covered.
- templates/nginx/nginx.conf: synced from canonical. Includes recent
security additions: redacted access-log format for token/secret
query params, rate-limit zones (api_global, api_auth, upload),
conditional HSTS via X-Forwarded-Proto map.
- api/scripts/render-for-instance.ts (new): one-off CLI that loads
an Instance row, decrypts secrets if present (or uses empty object
for isRegistered=true tenants), and calls renderAllTemplates() to
a scratch dir. Used in Phase 0.4 to verify the template-vs-prod
contract per tenant.
Usage:
docker compose exec ccp-api npx tsx scripts/render-for-instance.ts \
--slug changemakerlite
Phase 0 acceptance gate met:
- marcelle (release v2.10.2 install): 30-line diff, header-only
- linda (release v2.9.14 install): 30-line diff, header-only
- pia (release v2.9.10 install): 30-line diff, header-only
- env.hbs key coverage: 0 missing vs marcelle's .env
Next phases unblocked:
- Phase 1: add Instance.imageTag column (Prisma migration)
- Phase 2: pre-flight diff endpoint
- Phase 3: startReleaseUpgrade runner
- Phase 4: routes + schemas
- Phase 5: CCP UI "Upgrade to Release" button
- Phase 6: E2E test on marcelle (v2.10.2 -> v2.10.3)
Bunker Admin
Add a "Quick Upgrade" path that pulls latest container images and recreates
only the core app services (api, admin, media-api, nginx) without touching
any tracked files. Tenant content (mkdocs/, configs/, scripts/) is implicitly
preserved because the script never writes outside docker.
Faster (~2 min vs ~4-5 min for full upgrade) and structurally safer for
releases that don't change orchestration/templates.
Pieces:
- scripts/image-upgrade.sh: new ~350-line script. Phases: pre-flight +
mkdocs snapshot, image pull, targeted recreate (broad up -d would cascade
on misconfigured infra containers — proven on marcelle), light health
checks, deferred ccp-agent restart. Writes the same progress.json +
result.json schema as upgrade.sh so the CCP poll loop is unchanged.
- agent/src/routes/upgrade.routes.ts: POST /instance/:slug/upgrade/start-image-only.
Same lock + staleness guards as the existing /upgrade/start endpoint.
- api/src/services/remote-driver.ts: RemoteDriver.startImageUpgrade().
- api/src/services/upgrade.service.ts: startImageUpgrade() entry point;
reuses runRemoteUpgrade with mode='image-only' (only the initial agent
call differs — result schema and polling are identical).
- api/src/modules/instances/instances.routes.ts: POST /:id/upgrade-images
+ startImageUpgradeSchema.
- admin/src/pages/InstanceDetailPage.tsx: secondary "Quick Upgrade" button
next to "Upgrade Now" on the Updates tab. Tooltip explains when to use it.
Tested locally on marcelle (v2.10.2 idempotent run): 1m 49s, mkdocs.yml md5
unchanged, file count unchanged, only api/admin/media-api/nginx touched.
Subtle bug found and fixed: `set -o pipefail` + `grep -q` shorts pipe and
SIGPIPEs the writer — captured services list once instead.
Bunker Admin
Problem: the agent polled /poll every 30s while waiting for admin
approval. At 10 req/15min, the 11th poll hit 429 after ~5 min and
every subsequent one also failed — recovery required an agent
restart. A human-paced approval SLA is longer than 5 minutes.
CCP side (agents.routes.ts):
Split the one-size-fits-all agentRegistrationLimiter into two.
/register stays tight (10/15min — invite-code brute force is the
real attack surface). /poll gets a new agentPollLimiter at 180/15min
(one poll per ~5s upper bound), scoped to registrationId+slug so
blast radius is bounded.
Agent side (server.ts):
Replaced fixed 30s setInterval with a self-scheduling setTimeout
loop that backs off exponentially on HTTP 429 (30s → 60s → 120s →
300s cap) and resets to 30s on any 2xx. Stop-flag protects against
re-entry after approval. Fixes the "agent wedged at 429, restart to
recover" workaround.
Bunker Admin
agents.routes.ts approve handler: wrap prisma.instance.create in
try/catch. When a PrismaClientKnownRequestError P2002 with target
including 'slug' is thrown, convert to a 409 AppError with a clear
message directing the admin at DELETE /api/instances/:id or the new
scripts/ccp-deregister.sh on the target host.
Before this, re-registering a previously-registered host (after the
operator tore down the underlying stack without cleaning CCP's DB
row) returned 500 with a raw Prisma error string — the operator had
to read a stack trace to understand the cause. Now the error is
self-describing and points at the fix.
Matches the pattern other CCP error paths use.
Bunker Admin
Fixes surfaced by three rounds of fresh-install testing on marcelle:
- config.sh: add host-port preflight check (ss -tln) to catch
cockpit-on-9090 style collisions before compose up; add
--skip-port-check escape hatch; add --install-watcher /
--no-install-watcher / --install-backup-timer /
--no-install-backup-timer flags; -y --enable-all now installs both
systemd units by default (previously silently skipped); print
resolved admin email in Configuration Complete block.
- scripts/validate-env.sh: new section 5b "Host Port Availability"
using ss-based detection, with process-name surfacing when run as
root.
- scripts/pangolin-teardown.sh: new wrapper. Reads credentials from
.env or takes --api-url/--api-key/--org-id flags. Dry-run by
default; --yes to execute. Deletes resources before sites (avoids
orphans). --keep-site-ids for safety.
- scripts/build-release.sh: include validate-env.sh and
pangolin-teardown.sh in release tarball whitelist.
- CCP instances.service.ts: deleteInstance() now calls
teardownTunnel() before composeDown when pangolinSiteId is set.
Previously an admin clicking "Delete Instance" orphaned the
Pangolin site + all its resources. Best-effort with try/catch
matching the existing Docker-cleanup tolerance pattern.
- CLAUDE.md: sync drift — 44 → 50 migrations, 186 → 192 models,
40 → 44 modules.
Bunker Admin
All 13 nginx embed proxy ports (8881-8895) are now driven by environment
variables instead of being hardcoded. This prevents port conflicts when
running multiple Changemaker instances on the same host.
Chain: .env → docker-compose port mappings → nginx container env →
entrypoint.sh envsubst → services.conf.template listen directives →
API /services/config endpoint → frontend buildServiceUrl().
Existing deployments are unaffected (all vars default to current values).
Bunker Admin
Closes 12 template drift gaps between the Control Panel templates and
production configs. New instances now provision with full monitoring
(alerts fire properly), correct Gitea DB type (postgres not mysql),
social sharing previews (OG meta bot routes), Excalidraw subdomain
routing, docker-socket-proxy for Homepage, and complete Grafana/
Alertmanager/Prometheus config copying.
Key changes:
- Rewrite Prometheus template: add alerting, rule_files, 5 scrape jobs
- Add cAdvisor, node-exporter, redis-exporter, gotify, docker-socket-proxy
- Fix Gitea env from mysql to postgres to match docker-compose
- Add OG bot detection + rewrite routes for campaigns/pages/gallery
- Add Excalidraw nginx server block + Pangolin draw subdomain
- Add embed port to discovery portConfig + emailTestMode to registration
- Copy alerts.yml, alertmanager.yml, Grafana dashboards to templates
- Add Listmonk proxy port and upgrade volume to API service
Bunker Admin