This commit completes Phase 0 of Approach C: the CCP template/env/static
files now produce output structurally byte-identical to canonical
docker-compose.prod.yml + .env.example. Verified by rendering against
marcelle, linda, and pia and diffing against their actual files — all
three show only the 30-line CCP-tenant header comment differing,
zero service/env-var structural differences.
Changes:
- templates/docker-compose.yml.hbs: reverted {{imageTag}} substitutions
back to ${IMAGE_TAG:-latest} so the compose template is now byte-
equivalent to docker-compose.prod.yml (modulo header). CCP controls
per-instance image tag selection via the rendered .env's IMAGE_TAG,
which compose-up picks up at runtime. This single-source-of-truth
via env-substitution matches install.sh tenants exactly.
- templates/env.hbs: rewritten as a near-mirror of .env.example. Adds
27 missing keys (IMAGE_TAG, GITEA_REGISTRY, COMPOSE_PROFILES,
ENABLE_CCP_AGENT, GITEA_ADMIN_*, ENABLE_HLS_TRANSCODE, TZ, etc.)
plus 15 CCP-specific extras (embed ports, dev-mode helpers, etc.).
All 145 compose-template env-var references are now covered.
- templates/nginx/nginx.conf: synced from canonical. Includes recent
security additions: redacted access-log format for token/secret
query params, rate-limit zones (api_global, api_auth, upload),
conditional HSTS via X-Forwarded-Proto map.
- api/scripts/render-for-instance.ts (new): one-off CLI that loads
an Instance row, decrypts secrets if present (or uses empty object
for isRegistered=true tenants), and calls renderAllTemplates() to
a scratch dir. Used in Phase 0.4 to verify the template-vs-prod
contract per tenant.
Usage:
docker compose exec ccp-api npx tsx scripts/render-for-instance.ts \
--slug changemakerlite
Phase 0 acceptance gate met:
- marcelle (release v2.10.2 install): 30-line diff, header-only
- linda (release v2.9.14 install): 30-line diff, header-only
- pia (release v2.9.10 install): 30-line diff, header-only
- env.hbs key coverage: 0 missing vs marcelle's .env
Next phases unblocked:
- Phase 1: add Instance.imageTag column (Prisma migration)
- Phase 2: pre-flight diff endpoint
- Phase 3: startReleaseUpgrade runner
- Phase 4: routes + schemas
- Phase 5: CCP UI "Upgrade to Release" button
- Phase 6: E2E test on marcelle (v2.10.2 -> v2.10.3)
Bunker Admin
This session shipped:
- Approach B end-to-end (commit 4a3d9d7): full rollout to all 7 tenants;
marcelle E2E validated twice (121s + 100s).
- v2.10.2 surgical update applied to 6 remaining tenants.
This commit lands the kickoff for Approach C (template re-render path):
scripts/templates changes:
- docker-compose.yml.hbs.OLD-style-pre-approach-c: preserved old CCP
template (Handlebars-heavy, dynamic container names, secrets rendered
at template-time).
- docker-compose.yml.hbs: REWRITTEN as a near-mirror of canonical
docker-compose.prod.yml. Minimal Handlebars overlay:
- Header comment lists {{name}}, {{slug}}, {{composeProject}}.
- 5 image refs: ${IMAGE_TAG:-latest} -> {{imageTag}}, so CCP can
per-instance override once Phase 1 lands the Instance.imageTag column.
All other variation flows through env-var substitution from tenant's
.env. Container names are now hardcoded (matching prod), feature flags
are deferred to COMPOSE_PROFILES gating (matching prod).
Why a rewrite: the old CCP template and prod compose used fundamentally
different conventions (dynamic vs hardcoded names, render-time vs
substitute-time secrets, Handlebars vs profiles gating). Sync-by-addition
couldn't reconcile them. The rewrite makes Approach C re-render safe for
the install.sh-installed fleet (marcelle, linda, pia and future).
docs/SESSION_HANDOFF_2026-05-21.md: full session handoff covering fleet
state, Approach B rollout, Approach C plan, and where to start next
session. force-added because /docs is gitignored (same precedent as
docs/SESSION_HANDOFF_2026-05-20.md from prior session).
Phase 0 remaining work (next session):
- Audit env.hbs against new compose env-var expectations
- Sync static config files (nginx/, configs/prometheus/, etc.)
- Build api/scripts/render-for-instance.ts harness
- Iterate template until rendered output is per-instance-only diff
against marcelle/linda/pia actual compose.
Then Phases 1-6 per plan in subsequent sessions (~11-14 hours total).
Bunker Admin
Add a "Quick Upgrade" path that pulls latest container images and recreates
only the core app services (api, admin, media-api, nginx) without touching
any tracked files. Tenant content (mkdocs/, configs/, scripts/) is implicitly
preserved because the script never writes outside docker.
Faster (~2 min vs ~4-5 min for full upgrade) and structurally safer for
releases that don't change orchestration/templates.
Pieces:
- scripts/image-upgrade.sh: new ~350-line script. Phases: pre-flight +
mkdocs snapshot, image pull, targeted recreate (broad up -d would cascade
on misconfigured infra containers — proven on marcelle), light health
checks, deferred ccp-agent restart. Writes the same progress.json +
result.json schema as upgrade.sh so the CCP poll loop is unchanged.
- agent/src/routes/upgrade.routes.ts: POST /instance/:slug/upgrade/start-image-only.
Same lock + staleness guards as the existing /upgrade/start endpoint.
- api/src/services/remote-driver.ts: RemoteDriver.startImageUpgrade().
- api/src/services/upgrade.service.ts: startImageUpgrade() entry point;
reuses runRemoteUpgrade with mode='image-only' (only the initial agent
call differs — result schema and polling are identical).
- api/src/modules/instances/instances.routes.ts: POST /:id/upgrade-images
+ startImageUpgradeSchema.
- admin/src/pages/InstanceDetailPage.tsx: secondary "Quick Upgrade" button
next to "Upgrade Now" on the Updates tab. Tooltip explains when to use it.
Tested locally on marcelle (v2.10.2 idempotent run): 1m 49s, mkdocs.yml md5
unchanged, file count unchanged, only api/admin/media-api/nginx touched.
Subtle bug found and fixed: `set -o pipefail` + `grep -q` shorts pipe and
SIGPIPEs the writer — captured services list once instead.
Bunker Admin
The agent container runs as root but the bind-mounted instance directory
is owned by the host user (UID 1000 = `node` in the container). Modern
git refuses to operate on such repos without an explicit safe.directory
entry, breaking upgrade-check.sh's `git fetch/log` calls on source-installed
tenants. Verified empirically on soroush after the previous fix landed.
Bunker Admin
Three related fixes uncovered during a marcelle CCP registration test:
1. ccp-agent image was missing bash + curl + jq + python3, so every
spawn('bash', ...) in upgrade.routes.ts and backup.routes.ts failed
silently with ENOENT. CCP kept reading stale status.json files from
disk, masking that no agent had successfully checked for updates in
weeks. apk-add the missing tools.
2. ccp-agent's /app/instance mount was :ro, blocking the agent from
writing data/upgrade/status.json (and result/progress/backups).
Agent already has docker.sock — removing :ro is not a security
escalation. Patched both docker-compose.yml and docker-compose.prod.yml.
3. Gitea 1.23.x only initializes Release.CreatedUnix inside its
createTag() helper, which is skipped if the tag already exists on
origin. The old DEV_WORKFLOW pattern (push tag, then run
build-release.sh --upload) was triggering this — releases got
created_unix=0 and lost /releases/latest sort order to v2.9.14.
build-release.sh now removes the remote tag first and POSTs with
target_commitish so Gitea creates the tag and release atomically.
After these fixes, CCP's "Check for Updates" path returns truthful
data end-to-end (verified on marcelle: v2.9.15 -> v2.10.1, 1 behind).
Bunker Admin
Problem: the agent polled /poll every 30s while waiting for admin
approval. At 10 req/15min, the 11th poll hit 429 after ~5 min and
every subsequent one also failed — recovery required an agent
restart. A human-paced approval SLA is longer than 5 minutes.
CCP side (agents.routes.ts):
Split the one-size-fits-all agentRegistrationLimiter into two.
/register stays tight (10/15min — invite-code brute force is the
real attack surface). /poll gets a new agentPollLimiter at 180/15min
(one poll per ~5s upper bound), scoped to registrationId+slug so
blast radius is bounded.
Agent side (server.ts):
Replaced fixed 30s setInterval with a self-scheduling setTimeout
loop that backs off exponentially on HTTP 429 (30s → 60s → 120s →
300s cap) and resets to 30s on any 2xx. Stop-flag protects against
re-entry after approval. Fixes the "agent wedged at 429, restart to
recover" workaround.
Bunker Admin
agents.routes.ts approve handler: wrap prisma.instance.create in
try/catch. When a PrismaClientKnownRequestError P2002 with target
including 'slug' is thrown, convert to a 409 AppError with a clear
message directing the admin at DELETE /api/instances/:id or the new
scripts/ccp-deregister.sh on the target host.
Before this, re-registering a previously-registered host (after the
operator tore down the underlying stack without cleaning CCP's DB
row) returned 500 with a raw Prisma error string — the operator had
to read a stack trace to understand the cause. Now the error is
self-describing and points at the fix.
Matches the pattern other CCP error paths use.
Bunker Admin
Fixes surfaced by three rounds of fresh-install testing on marcelle:
- config.sh: add host-port preflight check (ss -tln) to catch
cockpit-on-9090 style collisions before compose up; add
--skip-port-check escape hatch; add --install-watcher /
--no-install-watcher / --install-backup-timer /
--no-install-backup-timer flags; -y --enable-all now installs both
systemd units by default (previously silently skipped); print
resolved admin email in Configuration Complete block.
- scripts/validate-env.sh: new section 5b "Host Port Availability"
using ss-based detection, with process-name surfacing when run as
root.
- scripts/pangolin-teardown.sh: new wrapper. Reads credentials from
.env or takes --api-url/--api-key/--org-id flags. Dry-run by
default; --yes to execute. Deletes resources before sites (avoids
orphans). --keep-site-ids for safety.
- scripts/build-release.sh: include validate-env.sh and
pangolin-teardown.sh in release tarball whitelist.
- CCP instances.service.ts: deleteInstance() now calls
teardownTunnel() before composeDown when pangolinSiteId is set.
Previously an admin clicking "Delete Instance" orphaned the
Pangolin site + all its resources. Best-effort with try/catch
matching the existing Docker-cleanup tolerance pattern.
- CLAUDE.md: sync drift — 44 → 50 migrations, 186 → 192 models,
40 → 44 modules.
Bunker Admin
Gitea SSO: cookie-based single sign-on via nginx auth_request — sets
cml_session cookie on login/refresh, validates via /api/auth/gitea-sso-validate,
injects X-WEBAUTH-USER header for reverse proxy auth. Dedicated GITEA_SSO_SECRET
and SERVICE_PASSWORD_SALT env vars isolate secret rotation.
Security fixes from March 30 audit: IDOR on ticketed events (requireEventOwnership
middleware), IDOR on action items (admin/assignee/creator check), path traversal
on photos (resolve-based validation), CSV upload size limit (5MB), shared calendar
email exposure removed.
Gitea provisioner: auto-sync docs repo collaborator access based on role
(CONTENT_ROLES get write, SUPER_ADMIN gets admin). Gitea client extended
with collaborator management API methods.
Production hardening: NODE_ENV defaults to production in docker-compose.prod.yml,
Grafana anonymous auth disabled, install.sh branch ref updated to main.
Admin UI: moved docs reset from toolbar to MkDocs Settings danger zone,
improved collab Ctrl+S to explicitly save + cache-bust preview.
MkDocs site rebuild with updated repo data, upgrade screenshots, and content.
Bunker Admin
Deferred findings from the March 27 security audit, plus a bug fix:
MongoDB keyfile (bug fix):
- Generate replica.key on first boot via entrypoint script
- Fixes crash from --auth + --keyFile without an existing keyfile
- Applied to docker-compose.yml, docker-compose.prod.yml, CCP template
I7 — Ticket overselling prevention (reservation pattern):
- Add reservedCount field to TicketTier schema
- Atomically increment reservedCount inside transaction on checkout
- Release reservation on checkout.session.completed (webhook)
- Release reservation on checkout.session.expired (webhook)
- Include reservedCount in availability calculations
I17 — Move refresh token to httpOnly cookie:
- Server sets httpOnly SameSite=Strict cookie on login/register/refresh
- Cookie scoped to /api/auth path, secure in production
- Refresh/logout endpoints read from cookie (with body fallback for compat)
- Frontend no longer stores refreshToken in localStorage
- Auth store simplified: removed refreshToken from state + persistence
- API interceptor uses withCredentials:true for automatic cookie sending
- Updated media-api, media-public-api, QuickJoinPage, volunteer-invite
- Renamed getTokens → getAccessToken across all media components
- Install cookie-parser middleware
L2 — FeatureGate loading state:
- Show Skeleton instead of children while settings are loading
- Prevents briefly exposing disabled feature pages
Bunker Admin
Drop the custom Dockerfile.code-server that bundled Claude Code CLI,
Python/MkDocs tooling, and build-essential on top of codercom base.
Switch to the already-mirrored linuxserver/code-server image instead.
- Both compose files: use code-server:latest, LinuxServer env vars
(PUID/PGID/DEFAULT_WORKSPACE), port 8443, /config mount layout
- Nginx configs + templates: proxy to :8443 instead of :8080
- API env default: CODE_SERVER_URL updated to :8443
- build-and-push.sh: remove --include-code-server flag
- upgrade.sh: remove code-server conditional rebuild + registry fallback
- install.sh: add --ignore-pull-failures for optional missing images
- .env.example, CCP templates, bunker-ops template: updated
Bunker Admin
All 13 nginx embed proxy ports (8881-8895) are now driven by environment
variables instead of being hardcoded. This prevents port conflicts when
running multiple Changemaker instances on the same host.
Chain: .env → docker-compose port mappings → nginx container env →
entrypoint.sh envsubst → services.conf.template listen directives →
API /services/config endpoint → frontend buildServiceUrl().
Existing deployments are unaffected (all vars default to current values).
Bunker Admin
Closes 12 template drift gaps between the Control Panel templates and
production configs. New instances now provision with full monitoring
(alerts fire properly), correct Gitea DB type (postgres not mysql),
social sharing previews (OG meta bot routes), Excalidraw subdomain
routing, docker-socket-proxy for Homepage, and complete Grafana/
Alertmanager/Prometheus config copying.
Key changes:
- Rewrite Prometheus template: add alerting, rule_files, 5 scrape jobs
- Add cAdvisor, node-exporter, redis-exporter, gotify, docker-socket-proxy
- Fix Gitea env from mysql to postgres to match docker-compose
- Add OG bot detection + rewrite routes for campaigns/pages/gallery
- Add Excalidraw nginx server block + Pangolin draw subdomain
- Add embed port to discovery portConfig + emailTestMode to registration
- Copy alerts.yml, alertmanager.yml, Grafana dashboards to templates
- Add Listmonk proxy port and upgrade volume to API service
Bunker Admin
New instances provisioned via CCP were missing env vars for video analytics,
geocoding config, Listmonk SMTP, Gitea comments, Overpass/area import,
monitoring ports, Bunker Ops, and other features added since the template
was last updated.
Bunker Admin