docs: session handoff 2026-05-22 — Approach C complete

Captures Phase 0 + Phases 1-5 outcomes, Phase 6 preview-path end-to-end validation against marcelle, known env-patch gap for install.sh tenants, fleet rollout status, and the operator path. Bunker Admin
feat(upgrade): Approach C - CCP-driven release upgrade (template re-render)
2026-05-22 09:50:14 -06:00 · 2026-05-22 09:45:37 -06:00 · 2026-05-22 09:35:30 -06:00 · 2026-05-21 19:32:21 -06:00 · 2026-05-21 15:20:35 -06:00 · 2026-05-21 13:42:08 -06:00
31 changed files with 5015 additions and 1129 deletions
--- a/.gitignore
+++ b/.gitignore
@ -64,6 +64,11 @@ core.*
 /backups/
 .upgrade.lock
 # Pre-upgrade mkdocs snapshots (created by scripts/lib/mkdocs-snapshot.sh).
 # These are the tenant-content rescue archives written before every upgrade;
 # discoverable in the install root via `ls`. Retention: last 5 (see helper).
 /mkdocs-backup-*.tar.gz
 # Release tarballs (generated by build-release.sh)
 /releases/
--- a/changemaker-control-panel/admin/src/pages/InstanceDetailPage.tsx
+++ b/changemaker-control-panel/admin/src/pages/InstanceDetailPage.tsx
@ -39,6 +39,8 @@ import {
  CloudOutlined,
  DisconnectOutlined,
  UploadOutlined,
  ThunderboltOutlined,
  CloudUploadOutlined,
  BellOutlined,
  CheckCircleOutlined,
  WarningOutlined,
@ -563,6 +565,71 @@ export default function InstanceDetailPage() {
    }
  };
  // Image-only upgrade (Approach B): pulls images + recreates core app services
  // without touching tracked files. Faster + safer than full upgrade for releases
  // that don't change compose/templates.
  const handleStartImageUpgrade = async () => {
    setUpgradingInstance(true);
    try {
      const { data } = await api.post(`/instances/${id}/upgrade-images`, {});
      setCurrentUpgrade(data.data);
      message.success('Image-only upgrade started');
    } catch (err: unknown) {
      const resp = (err as { response?: { data?: { error?: { message?: string } } } })?.response
        ?.data?.error;
      message.error(resp?.message || 'Failed to start image-only upgrade');
    } finally {
      setUpgradingInstance(false);
    }
  };
  // Release upgrade (Approach C): CCP re-renders templates with new image tag,
  // writes them to the tenant, then composePull + composeUp. For releases
  // that change orchestration (new services, compose config) in addition
  // to image versions. Tenant content (mkdocs/, customized configs/) is
  // never touched.
  const [releaseUpgradeModalOpen, setReleaseUpgradeModalOpen] = useState(false);
  const [releaseImageTag, setReleaseImageTag] = useState<string>('');
  const [releasePreview, setReleasePreview] = useState<{
    files: Array<{ path: string; status: string; diff: string | null; sizeBefore: number; sizeAfter: number }>;
    envCoverage?: { requiredVars: string[]; presentInTenantEnv: string[]; missingInTenantEnv: string[] };
  } | null>(null);
  const [releasePreviewLoading, setReleasePreviewLoading] = useState(false);
  const handlePreviewReleaseUpgrade = async () => {
    setReleasePreviewLoading(true);
    setReleasePreview(null);
    try {
      const body = releaseImageTag.trim() ? { imageTag: releaseImageTag.trim() } : {};
      const { data } = await api.post(`/instances/${id}/upgrade-release/preview`, body);
      setReleasePreview(data.data);
    } catch (err: unknown) {
      const resp = (err as { response?: { data?: { error?: { message?: string } } } })?.response
        ?.data?.error;
      message.error(resp?.message || 'Preview failed');
    } finally {
      setReleasePreviewLoading(false);
    }
  };
  const handleStartReleaseUpgrade = async () => {
    setUpgradingInstance(true);
    try {
      const body = releaseImageTag.trim() ? { imageTag: releaseImageTag.trim() } : {};
      const { data } = await api.post(`/instances/${id}/upgrade-release`, body);
      setCurrentUpgrade(data.data);
      setReleaseUpgradeModalOpen(false);
      setReleasePreview(null);
      message.success('Release upgrade started');
    } catch (err: unknown) {
      const resp = (err as { response?: { data?: { error?: { message?: string } } } })?.response
        ?.data?.error;
      message.error(resp?.message || 'Failed to start release upgrade');
    } finally {
      setUpgradingInstance(false);
    }
  };
  // Event handlers
  const handleAcknowledgeEvent = async (eventId: string) => {
    try {
@ -1632,12 +1699,38 @@ export default function InstanceDetailPage() {
                  closable
                />
              )}
-              <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
+              <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', gap: 16 }}>
-                <Typography.Text type="secondary">
+                <Typography.Text type="secondary" style={{ flex: 1 }}>
-                  Pulls latest code, runs migrations, and restarts services. CCP backup is recommended before upgrading.
+                  Full upgrade pulls the latest code, runs migrations, and restarts services. Quick upgrade only pulls images and recreates the core app — tenant content stays untouched and it&apos;s ~2 min faster. Use Quick when the release notes say no orchestration changes.
                </Typography.Text>
                <Space>
                  <Popconfirm
-                  title="Start upgrade?"
+                    title="Start quick (image-only) upgrade?"
                    description="Pulls new container images and recreates the API/Admin/Media/Nginx services. No filesystem changes — mkdocs and configs are not touched. Brief downtime is expected."
                    onConfirm={handleStartImageUpgrade}
                    disabled={instance.status !== 'RUNNING' && instance.status !== 'STOPPED'}
                  >
                    <Button
                      icon={<ThunderboltOutlined />}
                      loading={upgradingInstance}
                      disabled={instance.status !== 'RUNNING' && instance.status !== 'STOPPED'}
                    >
                      Quick Upgrade
                    </Button>
                  </Popconfirm>
                  <Button
                    icon={<CloudUploadOutlined />}
                    onClick={() => {
                      setReleaseImageTag(instance.imageTag || '');
                      setReleasePreview(null);
                      setReleaseUpgradeModalOpen(true);
                    }}
                    disabled={instance.status !== 'RUNNING' && instance.status !== 'STOPPED'}
                  >
                    Upgrade to Release
                  </Button>
                  <Popconfirm
                    title="Start full upgrade?"
                    description="This will pull the latest code, run database migrations, and restart all services. Brief downtime is expected."
                    onConfirm={handleStartUpgrade}
                    disabled={instance.status !== 'RUNNING' && instance.status !== 'STOPPED'}
@ -1651,6 +1744,7 @@ export default function InstanceDetailPage() {
                      Upgrade Now
                    </Button>
                  </Popconfirm>
                </Space>
              </div>
            </Space>
          )}
@ -2221,6 +2315,117 @@ export default function InstanceDetailPage() {
          </Space>
        </Modal>
      )}
      {/* Approach C: Release upgrade modal (CCP template re-render) */}
      <Modal
        title="Upgrade to Release"
        open={releaseUpgradeModalOpen}
        onCancel={() => setReleaseUpgradeModalOpen(false)}
        footer={[
          <Button key="cancel" onClick={() => setReleaseUpgradeModalOpen(false)}>
            Cancel
          </Button>,
          <Button
            key="preview"
            onClick={handlePreviewReleaseUpgrade}
            loading={releasePreviewLoading}
          >
            Preview Changes
          </Button>,
          <Button
            key="apply"
            type="primary"
            danger={
              !!releasePreview?.envCoverage?.missingInTenantEnv?.length
            }
            loading={upgradingInstance}
            onClick={handleStartReleaseUpgrade}
          >
            Apply Upgrade
          </Button>,
        ]}
        width={900}
      >
        <Space direction="vertical" style={{ width: '100%' }} size="middle">
          <div>
            <Typography.Text strong>Image Tag:</Typography.Text>
            <Input
              value={releaseImageTag}
              onChange={(e) => setReleaseImageTag(e.target.value)}
              placeholder="e.g. v2.10.3 (blank = use current env.IMAGE_TAG default)"
              style={{ marginTop: 4 }}
            />
            <Typography.Text type="secondary" style={{ fontSize: 12 }}>
              Re-renders docker-compose.yml + env + nginx configs with this tag. Tenant content
              (mkdocs/, custom configs/) is never touched. Click <em>Preview Changes</em> to see the
              per-file diff before applying.
            </Typography.Text>
          </div>
          {releasePreview && (
            <>
              {releasePreview.envCoverage?.missingInTenantEnv && releasePreview.envCoverage.missingInTenantEnv.length > 0 && (
                <Alert
                  type="error"
                  showIcon
                  message="Missing env vars in tenant .env"
                  description={
                    <div>
                      <div>The new docker-compose.yml references vars the tenant&apos;s .env does NOT define:</div>
                      <code style={{ display: 'block', marginTop: 8, fontSize: 11 }}>
                        {releasePreview.envCoverage.missingInTenantEnv.join(', ')}
                      </code>
                      <div style={{ marginTop: 8 }}>
                        Applying without these vars may break services. Add them to the tenant&apos;s .env
                        first, or reconcile the template.
                      </div>
                    </div>
                  }
                />
              )}
              <Typography.Text strong>
                Files: {releasePreview.files.length} total, {releasePreview.files.filter(f => f.status === 'modified').length} modified, {releasePreview.files.filter(f => f.status === 'created').length} created, {releasePreview.files.filter(f => f.status === 'unchanged').length} unchanged
              </Typography.Text>
              <div style={{ maxHeight: 500, overflow: 'auto', border: '1px solid #303030', borderRadius: 4 }}>
                {releasePreview.files.map((f) => (
                  <div key={f.path} style={{ padding: 8, borderBottom: '1px solid #303030' }}>
                    <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
                      <code style={{ fontSize: 12 }}>{f.path}</code>
                      <Tag
                        color={
                          f.status === 'unchanged' ? 'green' :
                          f.status === 'modified' ? 'gold' :
                          'blue'
                        }
                      >
                        {f.status} {f.sizeBefore !== undefined && `(${f.sizeBefore} → ${f.sizeAfter} B)`}
                      </Tag>
                    </div>
                    {f.diff && (
                      <pre
                        style={{
                          background: '#1e1e1e',
                          color: '#d4d4d4',
                          padding: 8,
                          marginTop: 8,
                          maxHeight: 200,
                          overflow: 'auto',
                          fontSize: 11,
                          borderRadius: 4,
                        }}
                      >
                        {f.diff}
                      </pre>
                    )}
                  </div>
                ))}
              </div>
            </>
          )}
        </Space>
      </Modal>
    </div>
  );
 }
--- a/changemaker-control-panel/admin/src/types/api.ts
+++ b/changemaker-control-panel/admin/src/types/api.ts
@ -9,6 +9,7 @@ export interface Instance {
  composeProject: string;
  gitBranch: string;
  gitCommit?: string;
  imageTag?: string;
  portConfig: Record<string, number>;
  enableMedia: boolean;
  enableChat: boolean;
--- a/changemaker-control-panel/agent/Dockerfile
+++ b/changemaker-control-panel/agent/Dockerfile
@ -8,7 +8,16 @@ COPY src/ ./src/
 RUN npx tsc
 FROM node:20-alpine
-RUN apk add --no-cache docker-cli docker-cli-compose git rsync
+# bash + curl + jq + python3 are required by the changemaker scripts the agent
 # shells out to (upgrade-check.sh, upgrade.sh, backup.sh). Without them, every
 # /upgrade/* and /backup/* call returns "command not found" failures.
 RUN apk add --no-cache docker-cli docker-cli-compose git rsync bash curl jq python3
 # Agent runs as root, but the bind-mounted /app/instance is owned by the host
 # user (UID 1000 = `node` inside the container). Modern git refuses to operate
 # on repos with mismatched ownership without an explicit safe.directory entry.
 # Wildcard whitelist all paths — the agent only mounts a single host directory
 # anyway (the instance's project root).
 RUN git config --system --add safe.directory '*'
 WORKDIR /app
 COPY package*.json ./
 RUN npm ci --production
--- a/changemaker-control-panel/agent/src/routes/files.routes.ts
+++ b/changemaker-control-panel/agent/src/routes/files.routes.ts
@ -24,6 +24,19 @@ router.post('/instance/:slug/files', async (req: Request, res: Response) => {
  res.json({ written: files.length });
 });
 // POST /instance/:slug/files/diff — Approach C pre-flight: diff proposed
 // rendered files against on-disk current content. Read-only.
 router.post('/instance/:slug/files/diff', async (req: Request, res: Response) => {
  const entry = await getSlugEntry(param(req, 'slug'));
  const { files } = req.body;
  if (!Array.isArray(files)) {
    res.status(400).json({ error: 'VALIDATION', message: 'files array required' });
    return;
  }
  const results = await fileService.diffFiles(entry.basePath, files);
  res.json({ files: results });
 });
 // POST /instance/:slug/mkdir — Create directory
 router.post('/instance/:slug/mkdir', async (req: Request, res: Response) => {
  const entry = await getSlugEntry(param(req, 'slug'));
--- a/changemaker-control-panel/agent/src/routes/upgrade.routes.ts
+++ b/changemaker-control-panel/agent/src/routes/upgrade.routes.ts
@ -188,6 +188,85 @@ router.post('/instance/:slug/upgrade/start', async (req: Request, res: Response)
  res.status(202).json({ started: true });
 });
 // POST /instance/:slug/upgrade/start-image-only — Run image-upgrade.sh in background
 //
 // Image-only upgrade: pulls latest images + recreates services without touching
 // tracked files (no git pull, no tarball extract, no VERSION mutation). Tenant
 // content is implicitly safe because the script never writes outside data/upgrade.
 // See scripts/image-upgrade.sh for full rationale.
 //
 // Schema-compatible with /upgrade/start: writes the same progress.json + result.json
 // so the CCP poll loop in runRemoteUpgrade() works unchanged.
 router.post('/instance/:slug/upgrade/start-image-only', async (req: Request, res: Response) => {
  const slug = param(req, 'slug');
  const entry = await getSlugEntry(slug);
  const { imageTag } = req.body || {};
  // SECURITY: imageTag flows into bash via --image-tag. Constrain to a safe
  // subset of docker tag chars (semver, SHA, named tags). Reject anything
  // that could shell-escape.
  if (imageTag && !/^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$/.test(String(imageTag))) {
    res.status(400).json({ error: 'VALIDATION', message: 'Invalid imageTag' });
    return;
  }
  const scriptPath = path.join(entry.basePath, 'scripts', 'image-upgrade.sh');
  try {
    await fs.access(scriptPath);
  } catch {
    res.status(404).json({ error: 'NOT_FOUND', message: 'image-upgrade.sh not found' });
    return;
  }
  // Same concurrency guards as the full /upgrade/start endpoint — uses the
  // same lock + on-disk staleness check + backup/restore mutex.
  if (isSlugLocked(slug, 'upgrade') || await isUpgradeRunningOnDisk(entry.basePath)) {
    res.status(409).json({ error: 'SLUG_BUSY', message: 'An upgrade is already in progress' });
    return;
  }
  if (isSlugLocked(slug, 'backup') || isSlugLocked(slug, 'restore')) {
    res.status(409).json({ error: 'SLUG_BUSY', message: 'A backup or restore is currently running' });
    return;
  }
  // Clear stale progress/result files (same convention as /upgrade/start)
  const progressPath = path.join(entry.basePath, 'data', 'upgrade', 'progress.json');
  const resultPath = path.join(entry.basePath, 'data', 'upgrade', 'result.json');
  await fs.mkdir(path.dirname(progressPath), { recursive: true });
  await fs.rm(progressPath, { force: true });
  await fs.rm(resultPath, { force: true });
  const args: string[] = [scriptPath, '--api-mode'];
  if (imageTag) args.push('--image-tag', String(imageTag));
  void withSlugLock(slug, 'upgrade', async () => {
    logger.info(`[image-upgrade] ${slug}: spawning ${args.join(' ')} (cwd=${entry.basePath})`);
    try {
      await new Promise<void>((resolve, reject) => {
        const proc = spawn('bash', args, {
          cwd: entry.basePath,
          env: { ...process.env, COMPOSE_ANSI: 'never' },
          stdio: ['ignore', 'ignore', 'ignore'],
        });
        proc.on('error', reject);
        proc.on('close', (code) => {
          if (code === 0) resolve();
          else reject(new Error(`image-upgrade.sh exited with code ${code}`));
        });
      });
      logger.info(`[image-upgrade] ${slug}: image-upgrade.sh completed`);
    } catch (err) {
      logger.error(`[image-upgrade] ${slug}: ${(err as Error).message}`);
    }
  }).catch((err) => {
    if (!(err instanceof SlugBusyError)) {
      logger.error(`[image-upgrade] ${slug}: lock or background error: ${(err as Error).message}`);
    }
  });
  res.status(202).json({ started: true, mode: 'image-only' });
 });
 // GET /instance/:slug/upgrade/progress — Read progress.json
 router.get('/instance/:slug/upgrade/progress', async (req: Request, res: Response) => {
  const entry = await getSlugEntry(param(req, 'slug'));
--- a/changemaker-control-panel/agent/src/services/file.service.ts
+++ b/changemaker-control-panel/agent/src/services/file.service.ts
@ -35,6 +35,113 @@ export async function writeFiles(
  }
 }
 /**
 * Diff proposed files against current on-disk contents at basePath.
 * For Approach C pre-flight preview: operator sees per-file change summary
 * before applying re-rendered templates. Returns one DiffResult per proposed
 * file. Uses a small inline LCS-based unified diff to avoid new deps.
 */
 export interface DiffResult {
  path: string;
  status: 'unchanged' | 'modified' | 'created';
  diff: string | null;
  sizeBefore: number;
  sizeAfter: number;
 }
 const DIFF_MAX_LINES = 500;
 function unifiedDiff(oldText: string, newText: string, relativePath: string): string {
  // Compact unified-diff: line-level LCS, emit context + changed lines.
  // Not a full GNU diff — adequate for compose/env/conf inspection in the UI.
  const oldLines = oldText.split('\n');
  const newLines = newText.split('\n');
  // Build LCS table (line-level). For files up to ~1500 lines this is O(N*M)
  // which is fine; we truncate output length not algorithm runtime.
  const m = oldLines.length, n = newLines.length;
  const dp: number[][] = Array.from({ length: m + 1 }, () => new Array<number>(n + 1).fill(0));
  for (let i = m - 1; i >= 0; i--) {
    for (let j = n - 1; j >= 0; j--) {
      dp[i][j] = oldLines[i] === newLines[j]
        ? dp[i + 1][j + 1] + 1
        : Math.max(dp[i + 1][j], dp[i][j + 1]);
    }
  }
  // Backtrack to emit unified-style hunks
  const out: string[] = [`--- a/${relativePath}`, `+++ b/${relativePath}`];
  let i = 0, j = 0, oldStart = 0, newStart = 0;
  const hunk: string[] = [];
  let emittedLines = 0;
  while ((i < m || j < n) && emittedLines < DIFF_MAX_LINES) {
    if (i < m && j < n && oldLines[i] === newLines[j]) {
      hunk.push(` ${oldLines[i]}`);
      i++; j++;
    } else if (j < n && (i === m || dp[i][j + 1] >= dp[i + 1][j])) {
      hunk.push(`+${newLines[j]}`);
      j++; newStart++;
    } else {
      hunk.push(`-${oldLines[i]}`);
      i++; oldStart++;
    }
    emittedLines++;
  }
  if (emittedLines >= DIFF_MAX_LINES) hunk.push(`... (diff truncated at ${DIFF_MAX_LINES} lines)`);
  out.push(...hunk);
  return out.join('\n');
 }
 export async function diffFiles(
  basePath: string,
  files: Array<{ relativePath: string; content: string }>
 ): Promise<DiffResult[]> {
  const results: DiffResult[] = [];
  for (const file of files) {
    const filePath = path.join(basePath, file.relativePath);
    assertWithin(filePath, basePath);
    const sizeAfter = Buffer.byteLength(file.content, 'utf-8');
    let current: string | null = null;
    try {
      current = await fs.readFile(filePath, 'utf-8');
    } catch {
      current = null;
    }
    if (current === null) {
      results.push({
        path: file.relativePath,
        status: 'created',
        diff: null,
        sizeBefore: 0,
        sizeAfter,
      });
      continue;
    }
    const sizeBefore = Buffer.byteLength(current, 'utf-8');
    if (current === file.content) {
      results.push({
        path: file.relativePath,
        status: 'unchanged',
        diff: null,
        sizeBefore,
        sizeAfter,
      });
      continue;
    }
    results.push({
      path: file.relativePath,
      status: 'modified',
      diff: unifiedDiff(current, file.content, file.relativePath),
      sizeBefore,
      sizeAfter,
    });
  }
  return results;
 }
 export async function mkdirp(basePath: string, relativePath: string): Promise<void> {
  const dirPath = path.join(basePath, relativePath);
  assertWithin(dirPath, basePath);
--- a/changemaker-control-panel/api/prisma/migrations/20260522093400_add_instance_image_tag/migration.sql
+++ b/changemaker-control-panel/api/prisma/migrations/20260522093400_add_instance_image_tag/migration.sql
@ -0,0 +1,4 @@
 -- Approach C: per-instance image tag override.
 -- NULL means "use env.IMAGE_TAG default". Set via CCP "Upgrade to Release"
 -- flow when operator chooses a tag for a specific tenant.
 ALTER TABLE "instances" ADD COLUMN "image_tag" TEXT;
--- a/changemaker-control-panel/api/prisma/schema.prisma
+++ b/changemaker-control-panel/api/prisma/schema.prisma
@ -70,6 +70,13 @@ model Instance {
  gitBranch       String         @default("v2") @map("git_branch")
  gitCommit       String?        @map("git_commit")
  // Per-instance image tag override (Approach C release upgrades).
  // NULL = fall back to env.IMAGE_TAG (the CCP-wide default). When set,
  // CCP renders this value into the tenant's .env IMAGE_TAG, and the
  // compose template's ${IMAGE_TAG:-latest} substitution picks it up at
  // compose-up time. Each tenant rolls forward on its own cadence.
  imageTag        String?        @map("image_tag")
  // Allocated host ports (JSON: { api: 14001, admin: 13001, postgres: 15401, nginx: 10001 })
  portConfig      Json           @map("port_config")
--- a/changemaker-control-panel/api/scripts/render-for-instance.ts
+++ b/changemaker-control-panel/api/scripts/render-for-instance.ts
@ -0,0 +1,115 @@
 #!/usr/bin/env tsx
 /**
 * render-for-instance.ts — Approach C Phase 0 verification harness.
 *
 * Loads a CCP-tracked Instance row, builds its template context, and renders
 * all templates to a scratch directory under /tmp/render-<slug>/. Operator
 * then diffs the rendered output against the tenant's actual on-disk files
 * to verify the template-vs-prod-compose equivalence contract.
 *
 * Usage (run inside ccp-api container):
 *   docker compose exec ccp-api npx tsx scripts/render-for-instance.ts --slug changemakerlite
 *   docker compose exec ccp-api npx tsx scripts/render-for-instance.ts --id <uuid>
 *
 * Output: prints scratch dir path; exits 0 on success, 1 on failure.
 *
 * This script does NOT touch any tenant. It only reads from the CCP database
 * and writes to /tmp on the CCP api container.
 */
 import { prisma } from '../src/lib/prisma';
 import { decryptJson } from '../src/utils/encryption';
 import {
  buildTemplateContext,
  renderAllTemplates,
 } from '../src/services/template-engine';
 import path from 'node:path';
 import fs from 'node:fs/promises';
 interface Args {
  slug?: string;
  id?: string;
  outDir?: string;
 }
 function parseArgs(argv: string[]): Args {
  const args: Args = {};
  for (let i = 0; i < argv.length; i++) {
    const a = argv[i];
    if (a === '--slug' && argv[i + 1]) { args.slug = argv[++i]; continue; }
    if (a === '--id' && argv[i + 1])   { args.id   = argv[++i]; continue; }
    if (a === '--out' && argv[i + 1])  { args.outDir = argv[++i]; continue; }
    if (a === '-h' || a === '--help') {
      console.log('usage: render-for-instance.ts (--slug X | --id Y) [--out /tmp/render-X]');
      process.exit(0);
    }
  }
  return args;
 }
 async function main() {
  const args = parseArgs(process.argv.slice(2));
  if (!args.slug && !args.id) {
    console.error('error: --slug or --id is required');
    process.exit(1);
  }
  const instance = await prisma.instance.findUnique({
    where: args.id ? { id: args.id } : { slug: args.slug! },
  });
  if (!instance) {
    console.error(`error: instance not found (slug=${args.slug ?? '?'}, id=${args.id ?? '?'})`);
    process.exit(1);
  }
  // For isRegistered tenants there are no encrypted secrets. Use empty stubs
  // so buildTemplateContext doesn't crash; env.hbs values that read from
  // {{secrets.*}} will render as blank, which is fine for diff purposes
  // because the tenant's own .env still has the real values via install.sh.
  let secrets: Record<string, string> = {};
  if (instance.encryptedSecrets) {
    try {
      secrets = decryptJson<Record<string, string>>(instance.encryptedSecrets);
    } catch (err) {
      console.warn(`warn: decryptJson failed (${(err as Error).message}); using empty secrets`);
    }
  } else {
    console.log(`(isRegistered=true tenant; using empty secrets for compose/nginx render — env.hbs values will be blank)`);
  }
  const outDir = args.outDir ?? path.join('/tmp', `render-${instance.slug}`);
  await fs.rm(outDir, { recursive: true, force: true });
  await fs.mkdir(outDir, { recursive: true });
  const context = buildTemplateContext(instance, secrets);
  await renderAllTemplates(context, outDir);
  // Summarize what we rendered
  const entries: string[] = [];
  async function walk(dir: string, rel = '') {
    const items = await fs.readdir(dir, { withFileTypes: true });
    for (const item of items) {
      const full = path.join(dir, item.name);
      const r = path.join(rel, item.name);
      if (item.isDirectory()) await walk(full, r);
      else entries.push(r);
    }
  }
  await walk(outDir);
  console.log(`\n=== rendered ${entries.length} files to: ${outDir} ===`);
  for (const e of entries.sort()) {
    const stat = await fs.stat(path.join(outDir, e));
    console.log(`  ${e}  (${stat.size} bytes)`);
  }
  console.log(`\nTo diff against the live tenant:`);
  console.log(`  ssh <tenant> 'cat <basePath>/docker-compose.yml' | diff -u - ${outDir}/docker-compose.yml`);
  console.log(``);
  await prisma.$disconnect();
 }
 main().catch((err) => {
  console.error('render-for-instance.ts failed:', err);
  process.exit(1);
 });
--- a/changemaker-control-panel/api/src/modules/instances/instances.routes.ts
+++ b/changemaker-control-panel/api/src/modules/instances/instances.routes.ts
@ -4,7 +4,7 @@ import rateLimit from 'express-rate-limit';
 import { prisma } from '../../lib/prisma';
 import { authenticate, requireRole } from '../../middleware/auth';
 import { validate } from '../../middleware/validate';
-import { createInstanceSchema, updateInstanceSchema, registerInstanceSchema, reconfigureInstanceSchema, configureTunnelSchema, importInstancesSchema, startUpgradeSchema, setupRemoteTunnelSchema } from './instances.schemas';
+import { createInstanceSchema, updateInstanceSchema, registerInstanceSchema, reconfigureInstanceSchema, configureTunnelSchema, importInstancesSchema, startUpgradeSchema, startImageUpgradeSchema, startReleaseUpgradeSchema, setupRemoteTunnelSchema } from './instances.schemas';
 import * as instancesService from './instances.service';
 import * as healthService from '../../services/health.service';
 import * as backupService from '../../services/backup.service';
@ -362,6 +362,60 @@ router.post(
  }
 );
 // Image-only upgrade (Approach B). Faster + safer than full upgrade for
 // releases that don't change orchestration/templates. See upgrade.service.ts
 // startImageUpgrade for full rationale.
 router.post(
  '/:id/upgrade-images',
  requireRole('SUPER_ADMIN', 'OPERATOR'),
  validate(startImageUpgradeSchema),
  async (req: Request, res: Response) => {
    const { imageTag } = req.body || {};
    const upgrade = await upgradeService.startImageUpgrade(
      req.params.id as string,
      req.user!.id,
      req.ip,
      { imageTag }
    );
    res.status(201).json({ data: upgrade });
  }
 );
 // Release upgrade (Approach C). Re-renders templates via CCP and applies
 // them to the tenant, then composePull + composeUp. Used when a release
 // changes orchestration in addition to image versions.
 router.post(
  '/:id/upgrade-release',
  requireRole('SUPER_ADMIN', 'OPERATOR'),
  validate(startReleaseUpgradeSchema),
  async (req: Request, res: Response) => {
    const { imageTag } = req.body || {};
    const upgrade = await upgradeService.startReleaseUpgrade(
      req.params.id as string,
      req.user!.id,
      req.ip,
      { imageTag }
    );
    res.status(201).json({ data: upgrade });
  }
 );
 // Approach C pre-flight: preview what re-render would change before applying.
 // READ-ONLY — tenant disk is not touched.
 router.post(
  '/:id/upgrade-release/preview',
  requireRole('SUPER_ADMIN', 'OPERATOR'),
  validate(startReleaseUpgradeSchema),
  async (req: Request, res: Response) => {
    const { imageTag } = req.body || {};
    const preview = await upgradeService.previewReleaseUpgrade(
      req.params.id as string,
      { imageTag }
    );
    res.json({ data: preview });
  }
 );
 router.get(
  '/:id/upgrade-status',
  requireRole('SUPER_ADMIN', 'OPERATOR'),
--- a/changemaker-control-panel/api/src/modules/instances/instances.schemas.ts
+++ b/changemaker-control-panel/api/src/modules/instances/instances.schemas.ts
@ -121,6 +121,30 @@ export const startUpgradeSchema = z.object({
    .optional(),
 });
 // Approach B: image-only upgrade. Pulls images + recreates core app services
 // without touching tracked files. imageTag is optional — if omitted, the
 // agent uses whatever IMAGE_TAG the install's .env / compose env defines
 // (typically `latest`). Tag must be a valid Docker tag.
 export const startImageUpgradeSchema = z.object({
  imageTag: z
    .string()
    .regex(/^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$/, 'Invalid imageTag')
    .optional(),
 });
 // Approach C: release upgrade via CCP template re-render. CCP renders the
 // docker-compose.yml + nginx confs + pangolin resources etc. against the
 // tenant's context (with the proposed imageTag), writes them to the tenant,
 // then composePull + composeUp. Used when a release changes orchestration
 // in addition to image versions. imageTag is the new value for the
 // per-instance Instance.imageTag column (NULL falls back to env default).
 export const startReleaseUpgradeSchema = z.object({
  imageTag: z
    .string()
    .regex(/^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$/, 'Invalid imageTag')
    .optional(),
 });
 export const setupRemoteTunnelSchema = z.object({
  // Empty string or omitted → resources use standard subdomains (app., api., etc.)
  // A value like "ck" → creates ck-app., ck-api., etc. for multi-tenant domains
--- a/changemaker-control-panel/api/src/services/execution-driver.ts
+++ b/changemaker-control-panel/api/src/services/execution-driver.ts
@ -24,6 +24,13 @@ export interface ExecutionDriver {
  // ─── Filesystem Operations ──────────────────────────────────
  readEnvFile(basePath: string): Promise<Record<string, string> | null>;
  writeFiles(basePath: string, files: Array<{ relativePath: string; content: string }>): Promise<void>;
  // Approach C pre-flight: diff proposed file contents against on-disk current.
  // Returns per-file status (unchanged | modified | created) + unified diff for modified.
  // Read-only.
  diffFiles(
    basePath: string,
    files: Array<{ relativePath: string; content: string }>
  ): Promise<Array<{ path: string; status: 'unchanged' | 'modified' | 'created'; diff: string | null; sizeBefore: number; sizeAfter: number }>>;
  mkdir(basePath: string, relativePath: string): Promise<void>;
  fileExists(basePath: string, relativePath: string): Promise<boolean>;
  deleteDirectory(dirPath: string): Promise<void>;
--- a/changemaker-control-panel/api/src/services/local-driver.ts
+++ b/changemaker-control-panel/api/src/services/local-driver.ts
@ -80,6 +80,35 @@ export class LocalDriver implements ExecutionDriver {
    }
  }
  // Approach C pre-flight diff. Reads current file contents at basePath +
  // relativePath, returns per-file status + diff. Local implementation
  // mirrors the agent-side diffFiles helper.
  async diffFiles(basePath: string, files: Array<{ relativePath: string; content: string }>) {
    const results: Array<{ path: string; status: 'unchanged' | 'modified' | 'created'; diff: string | null; sizeBefore: number; sizeAfter: number }> = [];
    for (const file of files) {
      const filePath = path.join(basePath, file.relativePath);
      const sizeAfter = Buffer.byteLength(file.content, 'utf-8');
      let current: string | null = null;
      try { current = await fs.readFile(filePath, 'utf-8'); } catch { current = null; }
      if (current === null) {
        results.push({ path: file.relativePath, status: 'created', diff: null, sizeBefore: 0, sizeAfter });
      } else if (current === file.content) {
        results.push({ path: file.relativePath, status: 'unchanged', diff: null, sizeBefore: Buffer.byteLength(current), sizeAfter });
      } else {
        // Minimal diff for local: full new content. Local mode is dev-only;
        // detailed diffs come from the agent-side implementation.
        results.push({
          path: file.relativePath,
          status: 'modified',
          diff: `--- a/${file.relativePath}\n+++ b/${file.relativePath}\n(local-driver: showing new content only)\n${file.content}`,
          sizeBefore: Buffer.byteLength(current),
          sizeAfter,
        });
      }
    }
    return results;
  }
  async mkdir(basePath: string, relativePath: string) {
    await fs.mkdir(path.join(basePath, relativePath), { recursive: true });
  }
--- a/changemaker-control-panel/api/src/services/remote-driver.ts
+++ b/changemaker-control-panel/api/src/services/remote-driver.ts
@ -82,6 +82,10 @@ export interface StartAgentUpgradeOptions {
  branch?: string;
 }
 export interface StartAgentImageUpgradeOptions {
  imageTag?: string;
 }
 interface AgentRequestOptions {
  method: 'GET' | 'POST' | 'DELETE';
  path: string;
@ -305,6 +309,17 @@ export class RemoteDriver implements ExecutionDriver {
    });
  }
  // Approach C pre-flight diff via agent.
  async diffFiles(_basePath: string, files: Array<{ relativePath: string; content: string }>) {
    const resp = await this.request<{ files: Array<{ path: string; status: 'unchanged' | 'modified' | 'created'; diff: string | null; sizeBefore: number; sizeAfter: number }> }>({
      method: 'POST',
      path: `/instance/${this.slug}/files/diff`,
      body: { files },
      timeoutMs: env.AGENT_LONG_OP_TIMEOUT_MS,
    });
    return resp.files;
  }
  async mkdir(_basePath: string, relativePath: string): Promise<void> {
    await this.request({
      method: 'POST',
@ -574,6 +589,21 @@ export class RemoteDriver implements ExecutionDriver {
    });
  }
  /**
   * Trigger image-upgrade.sh --api-mode on the remote (Approach B: image-only
   * upgrade — pulls images + recreates core app services without touching
   * the install tree). Fire-and-forget; returns 202 immediately. Uses the
   * same progress/result polling endpoints as startUpgrade.
   */
  async startImageUpgrade(options: StartAgentImageUpgradeOptions = {}): Promise<void> {
    await this.request({
      method: 'POST',
      path: `/instance/${this.slug}/upgrade/start-image-only`,
      body: options,
      timeoutMs: 30_000,
    });
  }
  /**
   * Read the agent's data/upgrade/progress.json. Returns the default zero-state
   * if no progress has been written yet.
--- a/changemaker-control-panel/api/src/services/template-engine.ts
+++ b/changemaker-control-panel/api/src/services/template-engine.ts
@ -135,6 +135,8 @@ export interface InstanceForTemplate {
  smtpFrom: string | null;
  emailTestMode: boolean;
  gitBranch: string;
  // Per-instance image tag override (Approach C). NULL falls back to env.IMAGE_TAG.
  imageTag: string | null;
 }
 /**
@ -208,7 +210,9 @@ export function buildTemplateContext(
    gitBranch: instance.gitBranch,
    registryUrl: env.GITEA_REGISTRY,
    useRegistry: env.USE_REGISTRY_IMAGES,
-    imageTag: env.IMAGE_TAG,
+    // Approach C: per-instance imageTag overrides the CCP-wide env default.
    // NULL on the Instance row falls back to env.IMAGE_TAG (typically 'latest').
    imageTag: instance.imageTag || env.IMAGE_TAG,
  };
 }
--- a/changemaker-control-panel/api/src/services/upgrade.service.ts
+++ b/changemaker-control-panel/api/src/services/upgrade.service.ts
@ -8,6 +8,8 @@ import { logger } from '../utils/logger';
 import { createEvent } from './event.service';
 import { getRemoteDriverForInstance } from './execution-driver';
 import type { AgentUpdateStatus } from './remote-driver';
 import { buildTemplateContext, clearTemplateCache, renderAllTemplatesInMemory } from './template-engine';
 import { decryptJson } from '../utils/encryption';
 /**
 * Shell-injection guards. Any user- or DB-controlled value that flows into
@ -205,6 +207,10 @@ export interface StartUpgradeOptions {
  branch?: string;
 }
 export interface StartImageUpgradeOptions {
  imageTag?: string;
 }
 /**
 * Start an upgrade for an instance. Returns the created InstanceUpgrade record.
 * The actual upgrade runs asynchronously (fire-and-forget).
@ -298,6 +304,393 @@ export async function startUpgrade(
  return upgrade;
 }
 /**
 * Start an IMAGE-ONLY upgrade (Approach B). Pulls latest images + recreates
 * core app services without touching tracked files. Faster (~2 min vs ~4-5
 * min for full upgrade) and safer because no filesystem mutation outside
 * docker — tenant content (mkdocs/, configs/) is implicitly preserved.
 *
 * Use this for releases that only bump container code or schema. For
 * releases that change compose orchestration, nginx config, or other
 * tracked files, use startUpgrade() instead.
 *
 * Remote-only for now: local mode would need a `runImageUpgrade` runner
 * which we haven't built (all our instances are remote via mTLS agent).
 */
 export async function startImageUpgrade(
  instanceId: string,
  userId: string,
  ipAddress?: string,
  options?: StartImageUpgradeOptions
 ) {
  const instance = await prisma.instance.findUnique({ where: { id: instanceId } });
  if (!instance) throw new Error('Instance not found');
  if (!instance.isRemote) {
    throw new Error('Image-only upgrade is currently supported only for remote instances');
  }
  if (instance.status !== InstanceStatus.RUNNING && instance.status !== InstanceStatus.STOPPED) {
    throw new Error(`Cannot upgrade instance in ${instance.status} state`);
  }
  // Reuse the same in-progress guard as startUpgrade: only one upgrade
  // (of either type) at a time per instance.
  const active = await prisma.instanceUpgrade.findFirst({
    where: {
      instanceId,
      status: { in: [UpgradeStatus.PENDING, UpgradeStatus.IN_PROGRESS] },
    },
  });
  if (active) {
    throw new Error('An upgrade is already in progress for this instance');
  }
  // Create upgrade record. branch is unused for image-only but keep it
  // populated with current branch for audit trail consistency.
  const upgrade = await prisma.instanceUpgrade.create({
    data: {
      instanceId,
      status: UpgradeStatus.PENDING,
      previousCommit: instance.gitCommit,
      branch: instance.gitBranch,
      triggeredById: userId,
    },
  });
  // Audit log
  await prisma.auditLog.create({
    data: {
      userId,
      instanceId,
      action: AuditAction.INSTANCE_UPGRADE,
      details: {
        upgradeId: upgrade.id,
        previousCommit: instance.gitCommit,
        source: 'remote',
        mode: 'image-only',
        options: options || {},
      } as unknown as Prisma.InputJsonValue,
      ipAddress,
    },
  });
  // Fire-and-forget: reuse runRemoteUpgrade with mode='image-only'. Same
  // poll loop and result handling — only the initial agent call differs.
  runRemoteUpgrade(upgrade.id, instance, undefined, 'image-only', options).catch((err) => {
    logger.error(`[image-upgrade] Remote image upgrade orchestration failed for ${instance.slug}: ${err}`);
  });
  return upgrade;
 }
 // ─── Approach C: Release upgrade (template re-render) ────────────────────────
 export interface StartReleaseUpgradeOptions {
  imageTag?: string;
 }
 const SAFE_IMAGE_TAG = /^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$/;
 /**
 * Files that should NOT be re-rendered for tenants without encryptedSecrets
 * (install.sh-registered tenants). Their .env was provisioned at install
 * time and contains real secrets we can't reproduce.
 */
 const REGISTERED_TENANT_SKIP_FILES = new Set(['.env']);
 /**
 * Filter rendered file list for tenants without secrets. For install.sh
 * tenants we keep the existing .env on disk (CCP can't render env without
 * secrets in DB). Compose, nginx, pangolin etc. still render correctly
 * because they only reference instance fields, not secrets directly.
 */
 function filterRenderedFilesForRegisteredTenant(
  files: Array<{ relativePath: string; content: string }>
 ): Array<{ relativePath: string; content: string }> {
  return files.filter(f => !REGISTERED_TENANT_SKIP_FILES.has(f.relativePath));
 }
 /**
 * Extract env var names referenced by the rendered docker-compose.yml.
 * Used to compute envCoverage for install.sh tenants — operator needs to
 * know if any ${VAR} references won't have a value in the tenant's .env.
 */
 function extractComposeEnvVars(composeYaml: string): string[] {
  const vars = new Set<string>();
  // Match ${VAR} or ${VAR:-default} or ${VAR:?required}
  const re = /\$\{([A-Z_][A-Z0-9_]*)(?:[:-?][^}]*)?\}/g;
  let m: RegExpExecArray | null;
  while ((m = re.exec(composeYaml)) !== null) {
    vars.add(m[1]);
  }
  return Array.from(vars).sort();
 }
 /**
 * Approach C pre-flight preview. Renders templates with the proposed
 * imageTag override and diffs against the tenant's current files. Also
 * computes envCoverage for install.sh tenants so the operator can see
 * if the new compose needs any env vars their .env doesn't have.
 * READ-ONLY — touches nothing on the tenant.
 */
 export async function previewReleaseUpgrade(
  instanceId: string,
  options?: StartReleaseUpgradeOptions
 ): Promise<{
  files: Array<{ path: string; status: 'unchanged' | 'modified' | 'created'; diff: string | null; sizeBefore: number; sizeAfter: number }>;
  envCoverage?: {
    requiredVars: string[];
    presentInTenantEnv: string[];
    missingInTenantEnv: string[];
  };
 }> {
  const instance = await prisma.instance.findUnique({ where: { id: instanceId } });
  if (!instance) throw new Error('Instance not found');
  if (!instance.isRemote) {
    throw new Error('Release upgrade preview is currently supported only for remote instances');
  }
  if (options?.imageTag && !SAFE_IMAGE_TAG.test(options.imageTag)) {
    throw new Error('Invalid imageTag');
  }
  // Build context with proposed imageTag override (not persisted)
  const previewInstance = { ...instance, imageTag: options?.imageTag ?? instance.imageTag };
  const secrets = instance.encryptedSecrets
    ? decryptJson<Record<string, string>>(instance.encryptedSecrets)
    : {};
  clearTemplateCache();
  const context = buildTemplateContext(previewInstance, secrets);
  let files = await renderAllTemplatesInMemory(context);
  // Skip .env for registered tenants (no secrets to render against)
  if (!instance.encryptedSecrets) {
    files = filterRenderedFilesForRegisteredTenant(files);
  }
  const driver = await getRemoteDriverForInstance({
    id: instance.id,
    slug: instance.slug,
    isRemote: instance.isRemote,
    agentUrl: instance.agentUrl,
  });
  const diffResults = await driver.diffFiles(instance.basePath, files);
  // For registered tenants: report envCoverage so operator knows if any
  // ${VAR} from the new compose isn't in their tenant .env. Required check
  // because CCP isn't rendering their env file.
  let envCoverage: { requiredVars: string[]; presentInTenantEnv: string[]; missingInTenantEnv: string[] } | undefined;
  if (!instance.encryptedSecrets) {
    const composeFile = files.find(f => f.relativePath === 'docker-compose.yml');
    if (composeFile) {
      const requiredVars = extractComposeEnvVars(composeFile.content);
      // Read tenant's current .env via the agent's readEnvFile
      const tenantEnv = await driver.readEnvFile(instance.basePath);
      const presentKeys = new Set(Object.keys(tenantEnv || {}));
      const presentInTenantEnv = requiredVars.filter(v => presentKeys.has(v));
      const missingInTenantEnv = requiredVars.filter(v => !presentKeys.has(v));
      envCoverage = { requiredVars, presentInTenantEnv, missingInTenantEnv };
    }
  }
  return { files: diffResults, envCoverage };
 }
 /**
 * Approach C apply path. Persists imageTag, re-renders templates, writes
 * them to the tenant, then composePull + composeUp --remove-orphans.
 * Fire-and-forget; status visible via the existing getUpgradeStatus() poll.
 */
 export async function startReleaseUpgrade(
  instanceId: string,
  userId: string,
  ipAddress?: string,
  options?: StartReleaseUpgradeOptions
 ) {
  const instance = await prisma.instance.findUnique({ where: { id: instanceId } });
  if (!instance) throw new Error('Instance not found');
  if (!instance.isRemote) {
    throw new Error('Release upgrade is currently supported only for remote instances');
  }
  if (instance.status !== InstanceStatus.RUNNING && instance.status !== InstanceStatus.STOPPED) {
    throw new Error(`Cannot upgrade instance in ${instance.status} state`);
  }
  if (options?.imageTag && !SAFE_IMAGE_TAG.test(options.imageTag)) {
    throw new Error('Invalid imageTag');
  }
  // Shared in-progress guard across all upgrade types.
  const active = await prisma.instanceUpgrade.findFirst({
    where: { instanceId, status: { in: [UpgradeStatus.PENDING, UpgradeStatus.IN_PROGRESS] } },
  });
  if (active) throw new Error('An upgrade is already in progress for this instance');
  const upgrade = await prisma.instanceUpgrade.create({
    data: {
      instanceId,
      status: UpgradeStatus.PENDING,
      previousCommit: instance.imageTag ?? instance.gitCommit,
      branch: instance.gitBranch,
      triggeredById: userId,
    },
  });
  await prisma.auditLog.create({
    data: {
      userId,
      instanceId,
      action: AuditAction.INSTANCE_UPGRADE,
      details: {
        upgradeId: upgrade.id,
        previousImageTag: instance.imageTag,
        newImageTag: options?.imageTag,
        source: 'remote',
        mode: 'release-template',
      } as unknown as Prisma.InputJsonValue,
      ipAddress,
    },
  });
  // Fire-and-forget runner. Distinct from runRemoteUpgrade because we don't
  // shell out to upgrade.sh — CCP does the render + compose orchestration
  // directly through the mTLS driver. No agent-side script involved.
  runReleaseUpgrade(upgrade.id, instance, options).catch((err) => {
    logger.error(`[release-upgrade] Orchestration failed for ${instance.slug}: ${err}`);
  });
  return upgrade;
 }
 /**
 * Internal: do the actual Approach C work. Updates DB, renders, writes,
 * pulls, recreates, verifies. All non-progress reporting comes via DB
 * status updates on the InstanceUpgrade row.
 */
 async function runReleaseUpgrade(
  upgradeId: string,
  instance: Instance,
  options?: StartReleaseUpgradeOptions
 ) {
  const slug = instance.slug;
  const newImageTag = options?.imageTag;
  const updateStatus = async (data: Prisma.InstanceUpgradeUpdateInput) => {
    await prisma.instanceUpgrade.update({ where: { id: upgradeId }, data });
  };
  try {
    await updateStatus({
      status: UpgradeStatus.IN_PROGRESS,
      currentPhase: 1,
      phaseName: 'Render',
      percentage: 10,
      progressMessage: 'Rendering templates with new image tag...',
    });
    // Persist new imageTag before render so buildTemplateContext picks it up.
    if (newImageTag) {
      await prisma.instance.update({ where: { id: instance.id }, data: { imageTag: newImageTag } });
    }
    const refreshed = await prisma.instance.findUniqueOrThrow({ where: { id: instance.id } });
    const secrets = refreshed.encryptedSecrets
      ? decryptJson<Record<string, string>>(refreshed.encryptedSecrets)
      : {};
    clearTemplateCache();
    const context = buildTemplateContext(refreshed, secrets);
    let files = await renderAllTemplatesInMemory(context);
    if (!refreshed.encryptedSecrets) {
      files = filterRenderedFilesForRegisteredTenant(files);
    }
    const driver = await getRemoteDriverForInstance({
      id: instance.id,
      slug: instance.slug,
      isRemote: instance.isRemote,
      agentUrl: instance.agentUrl,
    });
    // Phase 2: write rendered files
    await updateStatus({
      currentPhase: 2,
      phaseName: 'Write Files',
      percentage: 30,
      progressMessage: `Writing ${files.length} rendered file(s)...`,
    });
    await driver.writeFiles(instance.basePath, files);
    // Phase 3: pull images per new compose
    await updateStatus({
      currentPhase: 3,
      phaseName: 'Pull Images',
      percentage: 55,
      progressMessage: 'Pulling images from registry...',
    });
    await driver.composePull(instance.basePath, instance.composeProject);
    // Phase 4: recreate services
    await updateStatus({
      currentPhase: 4,
      phaseName: 'Recreate Services',
      percentage: 80,
      progressMessage: 'Recreating services with new orchestration...',
    });
    await driver.composeUp(instance.basePath, instance.composeProject);
    // Phase 5: verify (best-effort; soft warnings only)
    await updateStatus({
      currentPhase: 5,
      phaseName: 'Verify',
      percentage: 95,
      progressMessage: 'Verifying container health...',
    });
    const warnings: string[] = [];
    try {
      const containers = await driver.composePs(instance.basePath, instance.composeProject);
      const unhealthy = containers.filter(c => c.status && /restarting|exited/i.test(c.status));
      if (unhealthy.length > 0) {
        warnings.push(`${unhealthy.length} container(s) not healthy after upgrade: ${unhealthy.map(c => c.name).join(', ')}`);
      }
    } catch {
      warnings.push('composePs verification failed (services may still be starting)');
    }
    await updateStatus({
      status: UpgradeStatus.COMPLETED,
      currentPhase: 5,
      phaseName: 'Complete',
      percentage: 100,
      progressMessage: `Release upgrade complete${newImageTag ? ` (imageTag: ${newImageTag})` : ''}`,
      newCommit: newImageTag ?? refreshed.imageTag,
      commitCount: 0,
      warnings: warnings.length ? (warnings as unknown as Prisma.InputJsonValue) : undefined,
      completedAt: new Date(),
    });
    logger.info(`[release-upgrade] ${slug}: completed${newImageTag ? ` → ${newImageTag}` : ''}`);
  } catch (err) {
    const message = (err as Error).message || 'Release upgrade failed';
    await updateStatus({
      status: UpgradeStatus.FAILED,
      errorMessage: message,
      progressMessage: `Failed: ${message}`,
      completedAt: new Date(),
    });
    await createEvent(
      instance.id,
      'ERROR',
      'upgrade',
      'Release upgrade failed',
      message,
      { upgradeId, source: 'remote', mode: 'release-template' }
    );
    logger.error(`[release-upgrade] ${slug}: failed: ${message}`);
  }
 }
 /**
 * Async REMOTE upgrade runner.
 *
@ -316,7 +709,9 @@ export async function startUpgrade(
 async function runRemoteUpgrade(
  upgradeId: string,
  instance: Instance,
-  options?: StartUpgradeOptions
+  options?: StartUpgradeOptions,
  mode: 'full' | 'image-only' = 'full',
  imageOnlyOptions?: StartImageUpgradeOptions
 ) {
  const slug = instance.slug;
@ -333,18 +728,27 @@ async function runRemoteUpgrade(
      where: { id: upgradeId },
      data: {
        status: UpgradeStatus.IN_PROGRESS,
-        progressMessage: 'Starting remote upgrade...',
+        progressMessage: mode === 'image-only'
          ? 'Starting image-only upgrade...'
          : 'Starting remote upgrade...',
      },
    });
    // Tell the agent to start. The agent has its own mutex + stale-progress
    // check, so this can return 409 if a previous upgrade is still running.
    if (mode === 'image-only') {
      logger.info(`[upgrade] ${slug}: triggering remote image-upgrade.sh start`);
      await driver.startImageUpgrade({
        imageTag: imageOnlyOptions?.imageTag,
      });
    } else {
      logger.info(`[upgrade] ${slug}: triggering remote upgrade.sh start`);
      await driver.startUpgrade({
        skipBackup: options?.skipBackup,
        useRegistry: options?.useRegistry,
        branch: options?.branch,
      });
    }
    // Poll progress + result. We treat /result returning 200 as the signal
    // that upgrade.sh exited (successfully or with code != 0 — the script
--- a/changemaker-control-panel/templates/docker-compose.yml.hbs
+++ b/changemaker-control-panel/templates/docker-compose.yml.hbs
--- a/changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c
+++ b/changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c
--- a/changemaker-control-panel/templates/env.hbs
+++ b/changemaker-control-panel/templates/env.hbs
@ -1,65 +1,95 @@
-# ============================================================
+# ==============================================================================
-# Changemaker Lite — Instance: {{name}}
+# Changemaker Lite v2 — Tenant .env (CCP-rendered)
 # Instance: {{name}} ({{slug}})
 # Generated by CCP on {{now}}
-# ============================================================
+# ==============================================================================
 # This file is a near-mirror of changemaker.lite/.env.example with Handlebars
 # overlay for tenant-specific values (DOMAIN, secrets, COMPOSE_PROJECT_NAME).
 # Static defaults match .env.example so docker-compose.yml.hbs (a mirror of
 # docker-compose.prod.yml) has every ${VAR} it references.
 #
 # Keeping this in sync with .env.example after upstream additions: copy the
 # new key + default, replace any tenant-specific value with the matching
 # Handlebars expression. Most additions need no Handlebars.
 # ==============================================================================
-# Core
+# --- General ---
 NODE_ENV=production
 DOMAIN={{domain}}
 COMPOSE_PROJECT_NAME={{composeProject}}
 TZ=UTC
 USER_ID=1000
 GROUP_ID=1000
 DOCKER_GROUP_ID=984
-# V2 PostgreSQL
+# --- V2 PostgreSQL ---
 V2_POSTGRES_USER=changemaker
 V2_POSTGRES_PASSWORD={{secrets.postgresPassword}}
 V2_POSTGRES_DB=changemaker_v2
 V2_POSTGRES_PORT={{ports.postgres}}
 DATABASE_URL=postgresql://changemaker:{{secrets.postgresPassword}}@{{containerPrefix}}-postgres:5432/changemaker_v2
-# Redis
+# --- JWT Auth ---
 REDIS_PASSWORD={{secrets.redisPassword}}
 REDIS_URL=redis://:{{secrets.redisPassword}}@{{containerPrefix}}-redis:6379
 # JWT Auth
 JWT_ACCESS_SECRET={{secrets.jwtAccessSecret}}
 JWT_REFRESH_SECRET={{secrets.jwtRefreshSecret}}
 JWT_INVITE_SECRET={{secrets.jwtInviteSecret}}
 JWT_ACCESS_EXPIRY=15m
-# Reduced 2026-04-12 from 7d → 24h (P2-3). Combined with device-fingerprint
+# Reduced from 7d → 24h on 2026-04-12 (P2-3 hardening). Combined with
-# binding in the refresh JWT payload, this tightens the exploitation window
+# device-fingerprint binding in the JWT payload, this tightens the
-# for stolen refresh tokens.
+# exploitation window for stolen refresh tokens.
 JWT_REFRESH_EXPIRY=24h
-# Gitea SSO cookie signing + service password salt — REQUIRED 2026-04-12 (P2-2).
+# Encryption key for DB-stored secrets (SMTP password, etc.)
-# Distinct from JWT secrets; empty values will now fail Zod validation on boot.
+ENCRYPTION_KEY={{secrets.encryptionKey}}
 # Gitea SSO cookie signing secret + service password salt — REQUIRED 2026-04-12
 # (P2-2). Distinct from JWT secrets; empty values will fail Zod validation on
 # boot. Both ≥32 chars, distinct from each other and from JWT_* secrets.
 GITEA_SSO_SECRET={{secrets.giteaSsoSecret}}
 SERVICE_PASSWORD_SALT={{secrets.servicePasswordSalt}}
-# Encryption
+# --- Initial Super Admin User ---
 ENCRYPTION_KEY={{secrets.encryptionKey}}
 # Initial Admin
 INITIAL_ADMIN_EMAIL={{secrets.adminEmail}}
 INITIAL_ADMIN_PASSWORD={{secrets.initialAdminPassword}}
-# API
+# --- API ---
 API_PORT=4000
 PORT=4000
 API_URL=https://api.{{domain}}
 CORS_ORIGINS=https://app.{{domain}},http://localhost:{{ports.admin}},http://localhost
 # --- Admin GUI ---
 ADMIN_PORT=3000
 ADMIN_URL=https://app.{{domain}}
-# Admin GUI
+# --- Nginx ---
 ADMIN_PORT=3000
 # Nginx
 NGINX_HTTP_PORT={{ports.nginx}}
 NGINX_HTTPS_PORT=443
-# SMTP / Email
+# --- Embed Proxy Ports ---
 # Dedicated nginx ports for iframe embedding without DNS/subdomain.
 # CCP allocates these per-instance via {{ports.embed}} base + offset.
 NOCODB_EMBED_PORT={{math ports.embed "+" 0}}
 N8N_EMBED_PORT={{math ports.embed "+" 1}}
 GITEA_EMBED_PORT={{math ports.embed "+" 2}}
 MAILHOG_EMBED_PORT={{math ports.embed "+" 3}}
 MINI_QR_EMBED_PORT={{math ports.embed "+" 4}}
 EXCALIDRAW_EMBED_PORT={{math ports.embed "+" 5}}
 HOMEPAGE_EMBED_PORT={{math ports.embed "+" 6}}
 VAULTWARDEN_EMBED_PORT={{math ports.embed "+" 9}}
 ROCKETCHAT_EMBED_PORT={{math ports.embed "+" 10}}
 GANCIO_EMBED_PORT={{math ports.embed "+" 11}}
 JITSI_EMBED_PORT={{math ports.embed "+" 15}}
 GRAFANA_EMBED_PORT={{math ports.embed "+" 12}}
 ALERTMANAGER_EMBED_PORT={{math ports.embed "+" 16}}
 # --- Docker / Container Management ---
 DOCKER_NETWORK_NAME=changemaker-lite
 DOCKER_PROXY_URL=http://docker-socket-proxy:2375
 NEWT_CONTAINER_NAME=newt-changemaker
 NEWT_COMPOSE_SERVICE=newt
 # --- SMTP / Email ---
 {{#if emailTestMode}}
-SMTP_HOST={{containerPrefix}}-mailhog
+SMTP_HOST=mailhog-changemaker
 SMTP_PORT=1025
 SMTP_USER=
 SMTP_PASS=
@ -75,21 +105,9 @@ SMTP_FROM={{smtpFrom}}
 SMTP_FROM_NAME={{name}}
 TEST_EMAIL_RECIPIENT={{secrets.adminEmail}}
-# NocoDB
+# --- Listmonk ---
-NOCODB_V2_PORT=8080
+LISTMONK_PORT=9001
-NOCODB_URL=http://{{containerPrefix}}-nocodb:8080
+LISTMONK_DB_PORT=5434
 NC_ADMIN_EMAIL={{secrets.adminEmail}}
 NC_ADMIN_PASSWORD={{secrets.nocodbAdminPassword}}
 # Listmonk
 {{#if enableListmonk}}
 LISTMONK_SYNC_ENABLED=true
 LISTMONK_URL=http://{{containerPrefix}}-listmonk:9000
 {{else}}
 LISTMONK_SYNC_ENABLED=false
 LISTMONK_URL=
 {{/if}}
 LISTMONK_PORT=9000
 LISTMONK_DB_USER=listmonk
 LISTMONK_DB_PASSWORD={{secrets.listmonkAdminPassword}}
 LISTMONK_DB_NAME=listmonk
@ -99,26 +117,41 @@ LISTMONK_API_USER=v2-api
 LISTMONK_API_TOKEN={{secrets.listmonkApiToken}}
 LISTMONK_ADMIN_USER=v2-api
 LISTMONK_ADMIN_PASSWORD={{secrets.listmonkApiToken}}
-LISTMONK_PROXY_PORT=9002
+LISTMONK_SYNC_ENABLED={{#if enableListmonk}}true{{else}}false{{/if}}
 LISTMONK_WEBHOOK_SECRET=
-LISTMONK_DB_PORT=5434
+LISTMONK_PROXY_PORT=9002
-LISTMONK_SMTP_HOST={{containerPrefix}}-mailhog
+LISTMONK_SMTP_HOST=mailhog-changemaker
 LISTMONK_SMTP_PORT=1025
 LISTMONK_SMTP_USER=
 LISTMONK_SMTP_PASSWORD=
 LISTMONK_SMTP_TLS_TYPE=none
 LISTMONK_SMTP_FROM={{name}} <noreply@{{domain}}>
-# Media
+# --- Represent API (Canadian electoral data) ---
-{{#if enableMedia}}
+REPRESENT_API_URL=https://represent.opennorth.ca
-ENABLE_MEDIA_FEATURES=true
+
-MEDIA_API_PUBLIC_URL=https://media.{{domain}}
+# --- NocoDB v2 (read-only data browser) ---
-{{else}}
+NOCODB_V2_PORT=8091
-ENABLE_MEDIA_FEATURES=false
+NOCODB_URL=http://changemaker-v2-nocodb:8080
-MEDIA_API_PUBLIC_URL=
+NOCODB_PORT=8091
-{{/if}}
+NC_ADMIN_EMAIL={{secrets.adminEmail}}
 NC_ADMIN_PASSWORD={{secrets.nocodbAdminPassword}}
 NC_PUBLIC_URL=https://db.{{domain}}
 # --- Redis ---
 REDIS_PASSWORD={{secrets.redisPassword}}
 REDIS_URL=redis://:${REDIS_PASSWORD}@redis-changemaker:6379
 # --- Payments (Stripe) ---
 ENABLE_PAYMENTS={{#if enablePayments}}true{{else}}false{{/if}}
 # --- Media Management ---
 ENABLE_MEDIA_FEATURES={{#if enableMedia}}true{{else}}false{{/if}}
 MEDIA_API_PORT=4100
-MEDIA_ROOT=/media/local
+MEDIA_API_PUBLIC_URL=https://media.{{domain}}
 VITE_MEDIA_API_URL=http://changemaker-media-api:4100
 ENABLE_HLS_TRANSCODE=false
 MEDIA_ROOT=/media/library
 MEDIA_UPLOADS=/media/uploads
 MAX_UPLOAD_SIZE_GB=10
 PUBLIC_MEDIA_PORT=3100
@ -129,43 +162,111 @@ VIDEO_SCHEDULE_DEFAULT_TIMEZONE=UTC
 VIDEO_SCHEDULE_NOTIFICATION_ENABLED=true
 VIDEO_PREVIEW_LINK_EXPIRY_HOURS=24
-# NAR Data
+# --- Container Registry ---
-NAR_DATA_DIR=/data
+GITEA_REGISTRY=gitea.bnkops.com/admin
 IMAGE_TAG={{imageTag}}
 COMPOSE_PROFILES={{#if enableMonitoring}}monitoring{{/if}}{{#if enableCcpAgent}}{{#if enableMonitoring}},{{/if}}ccp-agent{{/if}}
 GITEA_REGISTRY_USER=admin
 GITEA_REGISTRY_PASS=
 GITEA_REGISTRY_API_TOKEN=
-# Platform Service URLs (used for health checks)
+# --- Gitea (Local Platform Instance) ---
-MINI_QR_URL=http://{{containerPrefix}}-mini-qr:8080
+GITEA_URL=http://gitea-changemaker:3000
-EXCALIDRAW_URL=http://{{containerPrefix}}-excalidraw:80
+GITEA_PORT=3030
 GITEA_WEB_PORT=3030
 GITEA_SSH_PORT=2222
 GITEA_ADMIN_USER=admin
 GITEA_ADMIN_PASSWORD={{secrets.giteaAdminPassword}}
 GITEA_DB_TYPE=mysql
 GITEA_DB_HOST=gitea-db:3306
 GITEA_DB_NAME=gitea
 GITEA_DB_USER=gitea
 GITEA_DB_PASSWD={{secrets.giteaAdminPassword}}
 GITEA_DB_ROOT_PASSWORD={{secrets.giteaAdminPassword}}
 GITEA_ROOT_URL=https://git.{{domain}}
 GITEA_DOMAIN=git.{{domain}}
 # --- Gitea Docs Comments ---
 GITEA_COMMENTS_ENABLED=false
 GITEA_API_TOKEN=
 GITEA_COMMENTS_REPO_OWNER=
 GITEA_COMMENTS_REPO_NAME=docs-comments
 GITEA_OAUTH_CLIENT_ID=
 GITEA_OAUTH_CLIENT_SECRET=
 # Docs source (Gitea repo containing the mkdocs/ tree)
 GITEA_DOCS_REPO=admin/changemaker.lite
 GITEA_DOCS_PREFIX=mkdocs/docs
 GITEA_DOCS_BRANCH=v2
 # --- n8n ---
 N8N_URL=http://n8n-changemaker:5678
 N8N_PORT=5678
 N8N_HOST=n8n.{{domain}}
 N8N_ENCRYPTION_KEY={{secrets.n8nEncryptionKey}}
 N8N_USER_EMAIL={{secrets.adminEmail}}
 N8N_USER_PASSWORD={{secrets.nocodbAdminPassword}}
 GENERIC_TIMEZONE=UTC
 # --- MkDocs ---
 MKDOCS_PORT=4003
 MKDOCS_SITE_SERVER_PORT=4004
 BASE_DOMAIN=https://{{domain}}
 MKDOCS_PREVIEW_URL=http://mkdocs:8000
 MKDOCS_DOCS_PATH=/mkdocs/docs
 # --- Code Server ---
 CODE_SERVER_PORT=8888
 CODE_SERVER_URL=http://code-server-changemaker:8443
 USER_NAME=coder
 # --- Homepage ---
 HOMEPAGE_PORT=3010
 HOMEPAGE_VAR_BASE_URL=http://localhost
 # --- Mini QR ---
 MINI_QR_PORT=8089
 MINI_QR_URL=http://mini-qr:8080
 # --- Excalidraw (Collaborative Whiteboard) ---
 EXCALIDRAW_PORT=8090
 EXCALIDRAW_URL=http://excalidraw-changemaker:80
 EXCALIDRAW_WS_URL=wss://draw.{{domain}}
-HOMEPAGE_URL=http://{{containerPrefix}}-homepage:3000
+
-VAULTWARDEN_URL=http://{{containerPrefix}}-vaultwarden:80
+# --- Vaultwarden (Password Manager) ---
 VAULTWARDEN_PORT=8445
 VAULTWARDEN_URL=http://vaultwarden-changemaker:80
 VAULTWARDEN_ADMIN_TOKEN={{secrets.vaultwardenAdminToken}}
 VAULTWARDEN_DOMAIN=https://vault.{{domain}}
 VAULTWARDEN_SIGNUPS_ALLOWED=false
 VAULTWARDEN_WEBSOCKET_ENABLED=true
 VAULTWARDEN_SMTP_SECURITY=off
-# Geocoding
+# --- MailHog ---
 MAILHOG_SMTP_PORT=1025
 MAILHOG_WEB_PORT=8025
 # --- NAR (National Address Register) ---
 NAR_DATA_DIR=/data
 # --- Overpass / Area Import ---
 OVERPASS_API_URL=https://overpass-api.de/api/interpreter
 OVERPASS_MIN_DELAY_MS=30000
 AREA_IMPORT_MAX_GRID_POINTS=500
 # --- Geocoding ---
 MAPBOX_API_KEY=
 GOOGLE_MAPS_API_KEY=
 GOOGLE_MAPS_ENABLED=false
 GEOCODING_RATE_LIMIT_MS=1100
 GEOCODING_CACHE_ENABLED=true
 GEOCODING_CACHE_TTL_HOURS=24
 GOOGLE_MAPS_API_KEY=
 GOOGLE_MAPS_ENABLED=false
 GEOCODING_PARALLEL_ENABLED=true
 GEOCODING_BATCH_SIZE=10
 BULK_GEOCODE_ENABLED=true
 BULK_GEOCODE_MAX_BATCH=5000
-# Represent API
+# --- Pangolin Tunnel ---
-REPRESENT_API_URL=https://represent.opennorth.ca
+PANGOLIN_API_URL=https://api.bnkserve.org/v1
 # Overpass / Area Import
 OVERPASS_API_URL=https://overpass-api.de/api/interpreter
 OVERPASS_MIN_DELAY_MS=30000
 AREA_IMPORT_MAX_GRID_POINTS=500
 # Pangolin Tunnel
 PANGOLIN_API_URL=
 PANGOLIN_API_KEY=
 PANGOLIN_ORG_ID=
 PANGOLIN_SITE_ID=
@ -174,178 +275,95 @@ PANGOLIN_ENDPOINT={{pangolin.endpoint}}
 PANGOLIN_NEWT_ID={{pangolin.newtId}}
 PANGOLIN_NEWT_SECRET={{pangolin.newtSecret}}
 {{else}}
-PANGOLIN_ENDPOINT=
+PANGOLIN_ENDPOINT=https://pangolin.bnkserve.org
 PANGOLIN_NEWT_ID=
 PANGOLIN_NEWT_SECRET=
 {{/if}}
-# Gancio
+# --- Prisma CLI (host-side only, NOT used by Docker containers) ---
-{{#if enableGancio}}
+DATABASE_URL=postgresql://changemaker:{{secrets.postgresPassword}}@localhost:{{ports.postgres}}/changemaker_v2
-GANCIO_SYNC_ENABLED=true
+
-GANCIO_URL=http://{{containerPrefix}}-gancio:13120
+# --- Rocket.Chat (Team Chat) ---
-{{else}}
+ENABLE_CHAT={{#if enableChat}}true{{else}}false{{/if}}
-GANCIO_SYNC_ENABLED=false
+ROCKETCHAT_ADMIN_USER=rcadmin
-GANCIO_URL=
+ROCKETCHAT_ADMIN_PASSWORD={{secrets.rocketchatAdminPassword}}
-{{/if}}
+ROCKETCHAT_URL=http://rocketchat-changemaker:3000
 MONGO_ROOT_USER=rocketchat
 MONGO_ROOT_PASSWORD={{secrets.mongoRootPassword}}
 # --- Gancio (Event Management) ---
 GANCIO_PORT=8092
 GANCIO_URL=http://gancio-changemaker:13120
 GANCIO_BASE_URL=https://events.{{domain}}
 GANCIO_ADMIN_USER=admin
 GANCIO_ADMIN_PASSWORD={{secrets.gancioAdminPassword}}
-GANCIO_PORT=8092
+GANCIO_SYNC_ENABLED={{#if enableGancio}}true{{else}}false{{/if}}
-# Chat (Rocket.Chat)
+# --- Jitsi Meet (Video Conferencing) ---
 {{#if enableChat}}
 ENABLE_CHAT=true
 ROCKETCHAT_URL=http://{{containerPrefix}}-rocketchat:3000
 ROCKETCHAT_ADMIN_USER=rcadmin
 ROCKETCHAT_ADMIN_PASSWORD={{secrets.rocketchatAdminPassword}}
 MONGO_ROOT_USER=rocketchat
 MONGO_ROOT_PASSWORD={{secrets.mongoRootPassword}}
 {{else}}
 ENABLE_CHAT=false
 ROCKETCHAT_URL=
 ROCKETCHAT_ADMIN_USER=
 ROCKETCHAT_ADMIN_PASSWORD=
 MONGO_ROOT_USER=
 MONGO_ROOT_PASSWORD=
 {{/if}}
 # Jitsi Meet (Video Conferencing)
 ENABLE_MEET={{#if enableMeet}}true{{else}}false{{/if}}
 {{#if enableMeet}}
 JITSI_APP_ID=changemaker
 JITSI_APP_SECRET={{secrets.jitsiAppSecret}}
 JITSI_JICOFO_AUTH_PASSWORD={{secrets.jitsiJicofoAuthPassword}}
 JITSI_JVB_AUTH_PASSWORD={{secrets.jitsiJvbAuthPassword}}
-JITSI_URL=http://{{containerPrefix}}-jitsi-web:80
+JITSI_URL=http://jitsi-web-changemaker:80
 JVB_ADVERTISE_IP={{jvbAdvertiseIp}}
 JVB_PORT=10000
 {{else}}
 JITSI_APP_ID=
 JITSI_APP_SECRET=
 JITSI_JICOFO_AUTH_PASSWORD=
 JITSI_JVB_AUTH_PASSWORD=
 JITSI_URL=
 JVB_ADVERTISE_IP=
 JVB_PORT=10000
 {{/if}}
-# SMS Campaigns
+# --- SMS Campaigns (Termux Android Bridge) ---
 ENABLE_SMS={{#if enableSms}}true{{else}}false{{/if}}
 TERMUX_API_URL=
 TERMUX_API_KEY=
 SMS_DELAY_BETWEEN_MS=3000
 SMS_MAX_RETRIES=3
-SMS_RESPONSE_SYNC_INTERVAL_MS=30000
+SMS_RESPONSE_SYNC_INTERVAL_MS=120000
-SMS_DEVICE_MONITOR_INTERVAL_MS=30000
+SMS_DEVICE_MONITOR_INTERVAL_MS=300000
-# Social Connections
+# --- Social, People & Analytics ---
 ENABLE_SOCIAL={{#if enableSocial}}true{{else}}false{{/if}}
 # People CRM
 ENABLE_PEOPLE={{#if enablePeople}}true{{else}}false{{/if}}
 # Analytics & GeoIP
 ENABLE_ANALYTICS={{#if enableAnalytics}}true{{else}}false{{/if}}
 MAXMIND_ACCOUNT_ID=
 MAXMIND_LICENSE_KEY=
-# Monitoring
+# --- Control Panel Agent ---
 # Tenants registered with CCP have these populated; CCP-provisioned tenants
 # get them set by the provisioner. Leaving blank if neither applies.
 ENABLE_CCP_AGENT=true
 CCP_URL=
 CCP_INVITE_CODE=
 CCP_AGENT_URL=
 CCP_AGENT_PORT=7443
 # --- Monitoring (only used with --profile monitoring) ---
 PROMETHEUS_PORT=9090
 GRAFANA_PORT=3005
 GRAFANA_ADMIN_PASSWORD={{secrets.grafanaAdminPassword}}
 GRAFANA_ROOT_URL=https://grafana.{{domain}}
 PROMETHEUS_PORT=9090
 GRAFANA_PORT=3000
 CADVISOR_PORT=8086
 NODE_EXPORTER_PORT=9100
 REDIS_EXPORTER_PORT=9121
 ALERTMANAGER_PORT=9093
 ALERTMANAGER_EMBED_PORT={{math ports.embed "+" 16}}
 GOTIFY_PORT=8889
 GOTIFY_ADMIN_USER=admin
 GOTIFY_ADMIN_PASSWORD=admin
-# MkDocs
+# --- Bunker Ops (Fleet Management) ---
 MKDOCS_PORT={{math ports.embed "+" 8}}
 MKDOCS_SITE_SERVER_PORT={{math ports.embed "+" 14}}
 MKDOCS_PREVIEW_URL=http://{{containerPrefix}}-mkdocs:8000
 MKDOCS_DOCS_PATH=/mkdocs/docs
 CODE_SERVER_PORT={{math ports.embed "+" 7}}
 CODE_SERVER_URL=http://{{containerPrefix}}-code-server:8443
 BASE_DOMAIN=https://{{domain}}
 # Gitea
 GITEA_URL=http://{{containerPrefix}}-gitea:3000
 GITEA_SSH_PORT=2222
 GITEA_DB_TYPE=postgres
 GITEA_DB_HOST={{containerPrefix}}-postgres:5432
 GITEA_DB_NAME=gitea
 GITEA_DB_USER=changemaker
 GITEA_DB_PASSWD={{secrets.postgresPassword}}
 GITEA_ROOT_URL=https://git.{{domain}}
 GITEA_DOMAIN=git.{{domain}}
 GITEA_COMMENTS_ENABLED=false
 GITEA_API_TOKEN=
 GITEA_COMMENTS_REPO_OWNER=
 GITEA_COMMENTS_REPO_NAME=docs-comments
 GITEA_OAUTH_CLIENT_ID=
 GITEA_OAUTH_CLIENT_SECRET=
 # n8n
 N8N_HOST=n8n.{{domain}}
 N8N_URL=http://{{containerPrefix}}-n8n:5678
 N8N_ENCRYPTION_KEY={{secrets.n8nEncryptionKey}}
 N8N_USER_EMAIL={{secrets.adminEmail}}
 N8N_USER_PASSWORD={{secrets.nocodbAdminPassword}}
 GENERIC_TIMEZONE=UTC
 # MailHog
 MAILHOG_URL=http://{{containerPrefix}}-mailhog:8025
 MAILHOG_SMTP_PORT=1025
 MAILHOG_WEB_PORT=8025
 # Homepage
 HOMEPAGE_PORT=3010
 HOMEPAGE_VAR_BASE_URL=http://localhost
 # Dev Tools
 {{#if enableDevTools}}
 ENABLE_DEV_TOOLS=true
 {{else}}
 ENABLE_DEV_TOOLS=false
 {{/if}}
 # Payments
 {{#if enablePayments}}
 ENABLE_PAYMENTS=true
 {{else}}
 ENABLE_PAYMENTS=false
 {{/if}}
 # Vite (admin build)
 VITE_API_URL=http://{{containerPrefix}}-api:4000
 VITE_MKDOCS_URL=http://{{containerPrefix}}-mkdocs:8000
 {{#if enableMedia}}
 VITE_MEDIA_API_URL=http://{{containerPrefix}}-media-api:4100
 {{/if}}
 # Bunker Ops (Fleet Management)
 INSTANCE_LABEL={{slug}}
 BUNKER_OPS_ENABLED=false
 BUNKER_OPS_REMOTE_WRITE_URL=
-# Embed proxy ports (nginx proxy for iframe embedding in admin GUI)
+# --- GeoIP (MaxMind GeoLite2) ---
-NOCODB_EMBED_PORT={{math ports.embed "+" 0}}
+MAXMIND_ACCOUNT_ID=
-N8N_EMBED_PORT={{math ports.embed "+" 1}}
+MAXMIND_LICENSE_KEY=
-GITEA_EMBED_PORT={{math ports.embed "+" 2}}
+
-MAILHOG_EMBED_PORT={{math ports.embed "+" 3}}
+# --- CCP-specific (admin GUI iframe embeds + dev-mode helpers) ---
-MINI_QR_EMBED_PORT={{math ports.embed "+" 4}}
+# These are CCP-only — not in canonical .env.example. Kept here because
-EXCALIDRAW_EMBED_PORT={{math ports.embed "+" 5}}
+# admin/vite uses them at build time and the embed proxies reference them.
-HOMEPAGE_EMBED_PORT={{math ports.embed "+" 6}}
+PORT=4000
 VITE_API_URL=http://changemaker-v2-api:4000
 HOMEPAGE_URL=http://homepage-changemaker:3000
 MAILHOG_URL=http://mailhog-changemaker:8025
 LISTMONK_URL=http://listmonk-app:9000
 CODE_SERVER_EMBED_PORT={{math ports.embed "+" 7}}
 MKDOCS_EMBED_PORT={{math ports.embed "+" 8}}
 VAULTWARDEN_EMBED_PORT={{math ports.embed "+" 9}}
 ROCKETCHAT_EMBED_PORT={{math ports.embed "+" 10}}
 GANCIO_EMBED_PORT={{math ports.embed "+" 11}}
 GRAFANA_EMBED_PORT={{math ports.embed "+" 12}}
 LISTMONK_EMBED_PORT={{math ports.embed "+" 13}}
 MKDOCS_SITE_EMBED_PORT={{math ports.embed "+" 14}}
-JITSI_EMBED_PORT={{math ports.embed "+" 15}}
+LISTMONK_EMBED_PORT={{math ports.embed "+" 13}}
 ENABLE_DEV_TOOLS={{#if enableDevTools}}true{{else}}false{{/if}}
--- a/changemaker-control-panel/templates/nginx/nginx.conf
+++ b/changemaker-control-panel/templates/nginx/nginx.conf
@ -10,7 +10,14 @@ http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
-    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
+    # Redact sensitive query parameters (token, secret) from access logs
    map $request_uri $redacted_request {
        ~^(?P<path>[^?]*)\?(?P<args>.*token=[^&]*)  "$path?<token-redacted>";
        ~^(?P<path>[^?]*)\?(?P<args>.*secret=[^&]*) "$path?<secret-redacted>";
        default                                       $request_uri;
    }
    log_format main '$remote_addr - $remote_user [$time_local] "$request_method $redacted_request $server_protocol" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';
@ -25,6 +32,12 @@ http {
    types_hash_max_size 2048;
    client_max_body_size 50m;
    # Rate limiting zones (defense-in-depth alongside app-level Redis rate limits)
    limit_req_zone $binary_remote_addr zone=api_global:10m rate=30r/s;
    limit_req_zone $binary_remote_addr zone=api_auth:10m rate=5r/s;
    limit_req_zone $binary_remote_addr zone=upload:10m rate=2r/s;
    limit_req_status 429;
    # Gzip compression
    gzip on;
    gzip_vary on;
@ -32,11 +45,17 @@ http {
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
    # Only send HSTS when the request arrived over HTTPS (via Pangolin tunnel)
    map $http_x_forwarded_proto $hsts_header {
        https  "max-age=31536000; includeSubDomains";
        default "";
    }
    # Security headers (applied globally — X-Frame-Options set per server block)
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;
-    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
+    add_header Strict-Transport-Security $hsts_header always;
    add_header Permissions-Policy "geolocation=(self), microphone=(), camera=()" always;
    # Docker internal DNS — enables runtime resolution so nginx starts
--- a/docker-compose.prod.yml
+++ b/docker-compose.prod.yml
@ -1427,9 +1427,10 @@ services:
      - /var/run/docker.sock:/var/run/docker.sock
      - ccp-agent-data:/var/lib/ccp-agent
      - ccp-agent-certs:/etc/ccp-agent
-      # Mount the instance directory so the agent can read compose files and run
+      # Mount the instance directory so the agent can read compose files and
-      # `docker compose -p <project>` commands against the real project on disk.
+      # write status.json + backups (writable; agent already has docker.sock,
-      - .:/app/instance:ro
+      # so file write access is not an additional security escalation).
      - .:/app/instance
    environment:
      - AGENT_PORT=7443
      - AGENT_DATA_DIR=/var/lib/ccp-agent
@ -1441,7 +1442,12 @@ services:
      - INSTANCE_BASE_PATH=/app/instance
      # Pass the host's compose project name so the agent runs `docker compose -p <project>`
      # against the right project (not basename of INSTANCE_BASE_PATH, which is "instance").
      # COMPOSE_PROJECT is read by the agent's TypeScript for slug derivation;
      # COMPOSE_PROJECT_NAME is what Docker Compose itself reads when upgrade.sh
      # shells out to `docker compose ...` — without it, compose defaults to
      # basename(cwd)="instance" and collides with the host's existing containers.
      - COMPOSE_PROJECT=${COMPOSE_PROJECT_NAME:-changemaker-lite}
      - COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME:-changemaker-lite}
    logging: *default-logging
    networks:
      - changemaker-lite
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1450,7 +1450,10 @@ services:
      - /var/run/docker.sock:/var/run/docker.sock
      - ccp-agent-data:/var/lib/ccp-agent
      - ccp-agent-certs:/etc/ccp-agent
-      - .:/app/instance:ro
+      # Writable: agent must write data/upgrade/{status,progress,result}.json
      # and data/backups/*.tar.gz. Agent already has docker.sock — file write
      # access is not an additional security escalation.
      - .:/app/instance
    environment:
      - AGENT_PORT=7443
      - AGENT_DATA_DIR=/var/lib/ccp-agent
@ -1462,7 +1465,12 @@ services:
      - INSTANCE_BASE_PATH=/app/instance
      # Pass the host's compose project name so the agent runs `docker compose -p <project>`
      # against the right project (not basename of INSTANCE_BASE_PATH, which is "instance").
      # COMPOSE_PROJECT is read by the agent's TypeScript for slug derivation;
      # COMPOSE_PROJECT_NAME is what Docker Compose itself reads when upgrade.sh
      # shells out to `docker compose ...` — without it, compose defaults to
      # basename(cwd)="instance" and collides with the host's existing containers.
      - COMPOSE_PROJECT=${COMPOSE_PROJECT_NAME:-changemaker-lite}
      - COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME:-changemaker-lite}
    logging: *default-logging
    networks:
      - changemaker-lite
--- a/docs/SESSION_HANDOFF_2026-05-20.md
+++ b/docs/SESSION_HANDOFF_2026-05-20.md
@ -0,0 +1,266 @@
 # Session Handoff: Upgrade Flow Redesign (2026-05-20 → 2026-05-21)
 > Carries forward all context from a long working session into the next conversation. If you're a fresh agent: read this top-to-bottom before touching anything.
 ---
 ## Quick state of the fleet
 | Tenant | Type | Version | Agent patched | Surgical script update | Notes |
 |---|---|---|---|---|---|
 | bnkops (n4) | source | main @ 1b80e82 | ✅ | ⏳ pending | Management node; CCP backend runs here in parallel |
 | marcelle (n5, cursedknowledge.org) | release | v2.9.15 | ✅ | ⏳ pending | Test bench; first end-to-end CCP upgrade test ran here (succeeded after manual Phase 6 recovery) |
 | trbh (n6) | source | main @ 1b80e82 | ✅ | ⏳ pending | mkdocs content RESTORED from `stash@{0}` — site serves "That Really Blonde Human" correctly |
 | pia (n3, pia-bnkops) | release | v2.9.10 | ✅ | ✅ **completed 2026-05-21** | First successful surgical update — proof the procedure works |
 | pridecorner (n1) | source | main @ 1b80e82 | ✅ | ⏳ pending | Has 3 March 9 upgrade-* stashes still on disk (audit done; recovery deferred to another agent) |
 | soroush (n7) | source | main @ 1b80e82 | ✅ | ⏳ pending | Was earliest-fixed tonight |
 | linda (n2, lindalindsay.org) | release-converted | v2.9.14 | ✅ | ⏳ pending | Was source-install with broken `.git`; converted to release mode (VERSION file written) |
 **Public sites verified working at session end**: trbh.org, docs.trbh.org, bnkops.com, pridecorner.ca, soroushsamavat.org, publicinterestalberta.org, lindalindsay.org, cursedknowledge.org.
 **Known caveat**: docs.bnkops.com returns HTTP 000 externally (Pangolin tunnel routing issue, pre-existing, NOT caused by this session). bnkops mkdocs container serves correct content locally.
 ---
 ## What landed in source (committed + pushed to origin/main)
 | Commit | Description |
 |---|---|
 | `1b80e82` | `fix(ccp-agent): whitelist /app/instance for git safe.directory` — ccp-agent Dockerfile |
 | `e88ac79` | `fix(ccp-agent): export COMPOSE_PROJECT_NAME so upgrade.sh sees correct project` — docker-compose.yml + .prod.yml |
 | `9613c3e` | `fix(upgrade): Phase 1 of upgrade-flow redesign (Approach A)` — upgrade.sh + scripts/lib/mkdocs-snapshot.sh + scripts/upgrade-stash-cleanup.sh + .gitignore |
 | `a7d3dd7` | `chore(release): ship scripts/lib/ + classify upgrade-stash-cleanup.sh` — build-release.sh |
 **Release**: v2.10.2 tagged on `a7d3dd7`, uploaded to Gitea Releases as the new "latest" (`/releases/latest` returns v2.10.2 — the timestamp issue from earlier in session is fixed via build-release.sh's `target_commitish` workaround).
 **Earlier in session**: tonight also produced commit `a531f9b` (ccp-agent missing bash/curl/jq/python3 + writable mount) and v2.10.1 release. v2.10.2 supersedes v2.10.1.
 ---
 ## The plan — Approach A (DONE) + B + C (pending)
 Full design lives at `/home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md`.
 ### Approach A — ✅ Done
 Three fixes to existing `scripts/upgrade.sh` shipping in v2.10.2:
 1. **Phase 6 self-destruct fix** — Phase 6's broad `docker compose up -d` no longer recreates ccp-agent (which would SIGKILL the running script). Instead, ccp-agent restart is deferred to AFTER `write_result` writes the final `result.json`, via a detached `nohup ... & disown` subshell.
 2. **mkdocs/ snapshot fallback** — `scripts/lib/mkdocs-snapshot.sh` is sourced by upgrade.sh's Phase 2. Before any other backup or pull operation, it tarballs the entire `mkdocs/` directory into `mkdocs-backup-<timestamp>.tar.gz` in the install root. Retains last 5. Discoverable via `ls`. Restoration is one-liner:
   ```bash
   tar xzf "$(ls -t mkdocs-backup-*.tar.gz | head -1)" -C . && \
   docker compose restart mkdocs mkdocs-site-server
   ```
 3. **`upgrade-stash-cleanup.sh`** — interactive utility to drop accumulated `upgrade-*` git stashes. Warns LOUDLY if any stash contains `mkdocs/mkdocs.yml` so operators verify recovery before dropping.
 ### Approach B — ⏳ Pending (1-2 days)
 Add `--image-only` upgrade mode. Production images are hermetic (bake compiled code + Prisma migrations + entrypoint runs migrations on container start). Therefore `docker compose pull && docker compose up -d` IS a complete code+schema upgrade. **No filesystem mutation outside Docker** → tenant content implicitly safe.
 New files to create:
 - `scripts/image-upgrade.sh` (~150 lines; sources `scripts/lib/mkdocs-snapshot.sh` for the fallback)
 - `changemaker-control-panel/agent/src/routes/upgrade.routes.ts` → new endpoint `POST /instance/:slug/upgrade/start-image-only`
 - `changemaker-control-panel/api/src/services/upgrade.service.ts` → `startImageUpgrade(instanceId, userId, { imageTag })`
 - `changemaker-control-panel/api/src/services/remote-driver.ts` → `startImageUpgrade()`
 - `changemaker-control-panel/api/src/modules/instances/instances.routes.ts` → `POST /:id/upgrade-images`
 - CCP admin UI: "Quick Upgrade (image-only)" button on `InstanceDetailPage.tsx`
 ### Approach C — ⏳ Pending (3-5 days)
 CCP-driven template re-render for orchestration-changing upgrades. Reuses existing `template-engine.ts` and `reconfigureInstance` pattern. Only writes templated files (compose, nginx, configs/pangolin); never touches `mkdocs/` or `configs/code-server/data/`. See plan for details.
 ---
 ## How to apply v2.10.2 fixes to remaining tenants
 **For PIA: already done** — used as the proof-of-concept on 2026-05-21. mkdocs.yml md5 unchanged, file count unchanged. ~5 minutes per tenant.
 **For the other 6 tenants**, use the surgical update — DO NOT run a raw `git pull origin main` (it would resurrect tenant-deleted files via merge logic):
 ### Source installs (bnkops, trbh, pridecorner, soroush)
 ```bash
 # bnkops, trbh, soroush use ~/changemaker.lite
 # pridecorner uses ~/cmlite/changemaker.lite
 cd ~/changemaker.lite  # or ~/cmlite/changemaker.lite
 git fetch origin main
 mkdir -p scripts/lib
 git checkout origin/main -- \
  scripts/upgrade.sh \
  scripts/upgrade-stash-cleanup.sh \
  scripts/lib/mkdocs-snapshot.sh \
  scripts/build-release.sh \
  docker-compose.yml \
  .gitignore
 # Sanity: tenant content should still be ahead/divergent (not touched)
 git status mkdocs/ configs/  # should show no NEW changes from this update
 ```
 ### Release installs (marcelle, linda) — used pia approach
 ```bash
 # marcelle: ~/changemaker.lite, ssh bunker-admin@100.90.78.47
 # linda: ~/changemaker.lite.canonical, ssh bunker-admin@n2-linda.taile33572.ts.net
 cd ~/changemaker.lite  # or ~/changemaker.lite.canonical
 curl -fSL https://gitea.bnkops.com/admin/changemaker.lite/releases/download/v2.10.2/changemaker-lite-v2.10.2.tar.gz \
  -o /tmp/v2.10.2.tar.gz
 mkdir -p scripts/lib
 tar -xzf /tmp/v2.10.2.tar.gz --strip-components=1 \
  changemaker-lite/scripts/upgrade.sh \
  changemaker-lite/scripts/upgrade-stash-cleanup.sh \
  changemaker-lite/scripts/lib/mkdocs-snapshot.sh \
  changemaker-lite/docker-compose.yml
 chmod +x scripts/upgrade.sh scripts/upgrade-stash-cleanup.sh scripts/lib/mkdocs-snapshot.sh
 rm -f /tmp/v2.10.2.tar.gz
 # Do NOT update VERSION — only scripts changed, rest of install stays at current version.
 ```
 ### Verification per tenant
 ```bash
 # Before update: capture
 md5sum mkdocs/mkdocs.yml
 find mkdocs/docs -type f | wc -l
 # Run the appropriate surgical update above
 # After update: re-verify (should match)
 md5sum mkdocs/mkdocs.yml  
 find mkdocs/docs -type f | wc -l
 # Confirm new upgrade.sh
 grep -c 'deferred ccp-agent\|Deferred ccp-agent' scripts/upgrade.sh  # expect 2
 # Optional: smoke-test the snapshot helper
 PROJECT_DIR=$(pwd) bash -c '. scripts/lib/mkdocs-snapshot.sh; snapshot_mkdocs'
 ls -lh mkdocs-backup-*.tar.gz
 ```
 ---
 ## Bug inventory — what we know
 ### Fixed in v2.10.2
 | Bug | Memory file | Status |
 |---|---|---|
 | Gitea release `created_unix=0` (lightweight tag + Gitea 1.23.x quirk) | `feedback_gitea_release_tag_timing.md` | Fixed in `build-release.sh` — uses `target_commitish` + removes remote tag first |
 | ccp-agent image missing bash/curl/jq/python3 + git safe.directory | `feedback_ccp_agent_image_deps.md` | Fixed in agent Dockerfile + rolled out to all 7 tenants |
 | ccp-agent compose mount was `:ro` (blocked status.json writes) | (in `feedback_ccp_agent_image_deps.md`) | Fixed in both compose files |
 | CCP upgrade Phase 5 collision: `COMPOSE_PROJECT_NAME` mismatch | `feedback_upgrade_compose_project_name.md` | Fixed via env-var addition in compose env block (e88ac79) — also needs `.env` entry on tenants installed before v2.10.2 |
 | upgrade.sh Phase 6 self-destruct | `feedback_upgrade_sh_bugs.md` | Fixed in v2.10.2 — deferred ccp-agent restart |
 ### Open
 - **upgrade.sh `git stash → git pull` stash-no-pop** — Pride Corner has 3 stashes from March 9 holding mkdocs.yml customizations. Existing `save_user_paths`/`restore_user_paths` in upgrade.sh handles the common case; the snapshot fallback (v2.10.2) covers edge cases. Pridecorner-specific recovery handled by another agent.
 - **Agent-side `detached: true` spawn** — Defense-in-depth. Skip unless Phase 6 self-destruct re-emerges.
 ---
 ## Tenant content protection layers (all in v2.10.2)
 1. **`save_user_paths`/`restore_user_paths`** in upgrade.sh — preserves working-tree state of `mkdocs/docs/`, `mkdocs/mkdocs.yml`, `mkdocs/site/`, `configs/`, `nginx/conf.d/services.conf` across `git pull`.
 2. **`git stash` + auto-resolve on USER_PATHS** — modified tracked files stash + pop with `git checkout --theirs` on USER_PATH conflicts.
 3. **Pre-upgrade mkdocs snapshot** — tarball of `mkdocs/` to install root before any other phase runs. Fallback for everything else.
 ---
 ## Tonight's recovery work — already applied
 These tenants had content damage from earlier in the session; recovery was completed:
 - **trbh** — mkdocs.yml + 143 M files restored from `stash@{0}`; 538 D-entry files re-deleted. Public sites serve correct branding.
 - **bnkops** — same pattern, 100 M files restored + 82 D-entry re-deletions. Public sites serve correct branding.
 - **marcelle** — manual recovery from Phase 6 self-destruct test (file rollback + service restart). On v2.10.1 currently. Operating normally.
 `stash@{0}` is preserved on trbh and bnkops as forensic record + safety net.
 ---
 ## CCP access
 ```
 URL:       http://n4-bnkops.taile33572.ts.net:5100  (UI)
           http://n4-bnkops.taile33572.ts.net:5000  (API)
 User:      admin@thebunkerops.ca
 Password:  NRTgHdC7Zxxs2P2UmNwnEbn3jTwU8uJN  (seed; rotate if you want)
 Role:      SUPER_ADMIN
 ```
 ---
 ## Test bench (marcelle)
 ```
 SSH:           ssh bunker-admin@100.90.78.47
 Install dir:   ~/changemaker.lite
 Domain:        cursedknowledge.org
 Admin:         admin@cursedknowledge.org / @TheBunker2025!
 CCP slug:      changemakerlite
 CCP id:        71b5bc4a-c47e-4435-b460-e9bc303b76ed
 ```
 Marcelle is the test bench per `docs/TEST_SERVER.md`. Use it for ALL upgrade experiments before touching production tenants.
 ---
 ## Per-tenant quick reference
 | Tenant | SSH | Install dir | CCP id |
 |---|---|---|---|
 | bnkops | bunker-admin@n4-bnkops.taile33572.ts.net | ~/changemaker.lite | 21238536-7c04-4a3b-a073-38390a939046 |
 | marcelle | bunker-admin@100.90.78.47 | ~/changemaker.lite | 71b5bc4a-c47e-4435-b460-e9bc303b76ed |
 | trbh | bunker-admin@n6-trbh.taile33572.ts.net | ~/changemaker.lite | c066dc23-64a5-4684-96a7-992e65c1b82c |
 | pia | pia-bnkops@n3-pia.taile33572.ts.net | ~/changemaker.lite | 92a11622-d357-4ab4-b21e-60c030c1b026 |
 | pridecorner | bunker-admin@n1-pridecorner.taile33572.ts.net | ~/cmlite/changemaker.lite | a30de94b-ef28-42b6-a71d-112669526a62 |
 | soroush | bunker-admin@n7-soroush.taile33572.ts.net | ~/changemaker.lite | 0c70f94c-1319-41e1-867c-5674f17cadda |
 | linda | bunker-admin@n2-linda.taile33572.ts.net | ~/changemaker.lite.canonical | 6dcc19a1-f4fd-45df-be77-5bf62f8110c8 |
 ---
 ## Most important "don't repeat my mistakes" notes
 1. **Never `git stash + git pull --ff-only origin main` on a tenant** outside of upgrade.sh. The stash silently displaces tenant content. If you must update files on a source-installed tenant, use targeted `git checkout origin/main -- <specific-file>` instead.
 2. **Never blindly trigger CCP "Upgrade Now"** on a tenant still running pre-v2.10.2 upgrade.sh — it will Phase 6 self-destruct. Apply surgical script update first (instructions above), THEN trigger CCP upgrade.
 3. **mkdocs/docs/ contains upstream tracked files** (default screenshots, demo docs, blog posts). Tenants typically delete these locally without committing. ANY operation that brings origin/main's tracked tree into the working tree (git pull, tarball extract) will resurrect them. v2.10.2's snapshot fallback gives you a recovery path; the surgical update procedure (this doc) avoids the issue entirely.
 4. **mkdocs/mkdocs.yml is tracked, tenant-customized** with branding. Lives under USER_PATHS so v2.10.2's upgrade.sh protects it. But if you do raw git operations outside the script, it's exposed.
 5. **CCP backend on n4 is decoupled from per-tenant ccp-agent**. Restarting a tenant's ccp-agent does NOT affect CCP itself. Verified during bnkops patch (CCP backend stayed at 41h uptime while ccp-agent recreated).
 ---
 ## Memory files (in `/home/bunker-admin/.claude/projects/-home-bunker-admin-changemaker-lite/memory/`)
 Latest session work documented in:
 - `feedback_gitea_release_tag_timing.md`
 - `feedback_ccp_agent_image_deps.md`
 - `feedback_upgrade_compose_project_name.md`
 - `feedback_upgrade_sh_bugs.md`
 - `feedback_session_2026_05_20_damage_report.md`
 Plus the architectural plan: `/home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md`
 ---
 ## Where to start the next session
 Recommended sequence:
 1. **Apply surgical update to remaining 6 tenants** (~30-45 min, low risk; pia procedure already proven). Order: marcelle, linda (release), then soroush, trbh, bnkops, pridecorner (source).
 2. **Test CCP-driven upgrade on marcelle** after surgical update lands. This will verify the deferred ccp-agent restart works end-to-end through the CCP path (the test we couldn't complete tonight because Phase 6 kept self-destructing).
 3. **Implement Approach B** per the plan — image-only upgrade mode. Estimated 1-2 days.
 4. **Implement Approach C** — CCP template re-render. 3-5 days.
 If only one thing happens next session: **do step 1**. Six surgical updates × ~5 minutes each. The rest of the fleet stays vulnerable to Phase 6 self-destruct until they're on v2.10.2's upgrade.sh.
--- a/docs/SESSION_HANDOFF_2026-05-21.md
+++ b/docs/SESSION_HANDOFF_2026-05-21.md
@ -0,0 +1,169 @@
 # Session Handoff: Approach B Rollout + Approach C Planning (2026-05-21)
 Carries forward all context from a long working session. If you're a fresh agent: read this top-to-bottom before touching anything.
 ---
 ## What landed in this session (commits on origin/main)
 | Commit | Description |
 |---|---|
 | `4a3d9d7` | `feat(upgrade): Approach B - image-only upgrade mode` — 7 files, 666 insertions. scripts/image-upgrade.sh + CCP agent endpoint + CCP backend (driver/service/route/schema) + admin UI "Quick Upgrade" button. |
 | `<this commit>` | docs: session handoff + Approach C Phase 0 initial template overlay |
 Plus several non-tracked deploys:
 - v2.10.2 surgical update applied to remaining 6 tenants (soroush, linda, marcelle, bnkops, trbh, pridecorner — pia was done previously). All verified mkdocs untouched, upgrade.sh sha matches `b9f37d59...`.
 - Fleet rollout of Approach B: new `image-upgrade.sh` script delivered + new `ccp-agent` image (with `/upgrade/start-image-only` endpoint) deployed to all 7 tenants. Bnkops's ccp-agent was rebuilt from source (builds locally rather than pulled from registry).
 ---
 ## Fleet state at session end
 | Tenant | Surgical update v2.10.2 | image-upgrade.sh | New ccp-agent with image-only endpoint |
 |---|---|---|---|
 | pia | ✅ (prior session) | ✅ | ✅ |
 | soroush | ✅ | ✅ | ✅ |
 | linda | ✅ | ✅ | ✅ |
 | marcelle | ✅ + tested both A and B E2E | ✅ | ✅ |
 | bnkops | ✅ | ✅ | ✅ (rebuilt locally) |
 | trbh | ✅ | ✅ | ✅ |
 | pridecorner | ✅ | ✅ | ✅ |
 Marcelle E2E test results:
 - **Approach A (full upgrade)**: v2.10.1 → v2.10.2 in 250s, COMPLETED, no SIGKILL on script. Phase 6 deferred ccp-agent restart fix worked end-to-end through CCP path.
 - **Approach B (Quick Upgrade) run 1**: 121s, COMPLETED, mkdocs.yml md5 unchanged.
 - **Approach B (Quick Upgrade) run 2**: 100s (cached pull), COMPLETED, mkdocs unchanged again — confirms idempotency.
 ---
 ## Fleet backup (Phase 0 work — defensive)
 All 7 tenants backed up to `/media/bunker-admin/BACKUP/fleet/<node>/2026-05-21-pre-v2.10.2/`:
 | Node | Tenant | Size |
 |---|---|---|
 | n1 | pridecorner | 182MB (includes 3 stash patches from March 9) |
 | n2 | linda | 26MB |
 | n3 | pia | 45MB (post-surgical state) |
 | n4 | bnkops | 4.4GB (huge — 2277 mkdocs/docs files) |
 | n5 | marcelle | 28MB |
 | n6 | trbh | 336MB |
 | n7 | soroush | 76MB |
 Each tenant dir has `mkdocs.tar.gz`, `configs-and-nginx.tar.gz`, `config-files.tar.gz`, `host-state.txt`, `git-state.txt` (source installs only), and `MANIFEST.txt`.
 ---
 ## Approach C planning + initial overlay
 **Decision: rewrite `docker-compose.yml.hbs` in prod-compose style** to make CCP-driven template re-render safe for the install.sh fleet.
 ### Why a rewrite (not sync-by-addition)
 Discovered the CCP template and `docker-compose.prod.yml` use fundamentally different conventions:
 | | Old template (`.hbs`) | Canonical prod |
 |---|---|---|
 | Container names | `{{containerPrefix}}-postgres` (dynamic) | `changemaker-v2-postgres` (hardcoded) |
 | Secrets | `{{secrets.postgresPassword}}` (Handlebars-rendered) | `${POSTGRES_PASSWORD}` (env-substituted) |
 | Optional services | `{{#if enableX}}` blocks | Always-defined, gated via `COMPOSE_PROFILES` |
 | Ports | `{{ports.api}}` | Hardcoded |
 Sync-by-additions can't reconcile these. Rewrite is cleaner long-term.
 ### Initial overlay committed this session
 `changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c` — preserved old template for reference.
 `changemaker-control-panel/templates/docker-compose.yml.hbs` — now a near-mirror of `changemaker.lite/docker-compose.prod.yml` (1493 lines + Handlebars header):
 - Header comment includes `{{name}}`, `{{slug}}`, `{{composeProject}}` for traceability.
 - 5 image refs replaced `${IMAGE_TAG:-latest}` → `{{imageTag}}` so CCP can per-instance override via `Instance.imageTag` once Phase 1 lands.
 - All other variation flows through env-var substitution from tenant's `.env`.
 ### Remaining Approach C work (next session)
 See `/home/bunker-admin/.claude/plans/insight-temporal-bachman.md` for the full plan. Quick summary of what's next:
 **Phase 0 completion (next session):**
 - Audit `env.hbs` against the new compose's expected env vars. Add missing.
 - Sync static config files in `templates/`: nginx/, configs/prometheus/, configs/alertmanager/, configs/grafana/. They may have drifted too.
 - Write a one-off render harness (`api/scripts/render-for-instance.ts`) that loads an instance row, builds context, renders templates to scratch dir.
 - Render against marcelle, linda, pia. Diff against their actual files. Iterate the template until diff is per-instance values only (`COMPOSE_PROJECT_NAME`, ports, secrets — not structure).
 **Phase 1 (~30 min):** Add `Instance.imageTag` Prisma column + migration. Modify `template-engine.ts:211` to use `instance.imageTag || env.IMAGE_TAG`.
 **Phase 2 (~3-4 hr):** Pre-flight diff endpoint. New agent route `POST /instance/:slug/files/diff` + `RemoteDriver.diffFiles()` + `LocalDriver.diffFiles()` + `previewReleaseUpgrade()` in upgrade.service. Includes `envCoverage` check for registered tenants.
 **Phase 3 (~3-4 hr):** `startReleaseUpgrade()` + `runReleaseUpgrade()` in upgrade.service. Split logic for `isRegistered=true` (skip env render) vs `isRegistered=false` (render env).
 **Phase 4 (~30 min):** CCP routes `/upgrade-release` + `/upgrade-release/preview` + Zod schema.
 **Phase 5 (~2-3 hr):** "Upgrade to Release" UI button + preview modal + env-coverage warning.
 **Phase 6 (~1 hr):** Tag v2.10.3 in changemaker.lite, push images with tag, trigger upgrade-release on marcelle via CCP UI, verify mkdocs untouched + containers on new tag.
 **Total remaining: 11-14 hours.** Recommended split:
 - Session 2: complete Phase 0 (render harness + iterate template + env.hbs sync + static file syncs). ~half day.
 - Session 3: Phases 1-5. ~half day.
 - Session 4: Phase 6 E2E test. ~1 hour.
 ---
 ## Critical files for Approach C
 **Already modified this session:**
 - `changemaker-control-panel/templates/docker-compose.yml.hbs` — overlay from prod compose with minimal Handlebars markup.
 - `changemaker-control-panel/templates/docker-compose.yml.hbs.OLD-style-pre-approach-c` — preserved old template.
 **To be modified in next sessions (per plan):**
 - `changemaker-control-panel/templates/env.hbs` (Phase 0 audit)
 - `changemaker-control-panel/templates/configs/**` (Phase 0 syncs)
 - `changemaker-control-panel/api/prisma/schema.prisma` (Phase 1)
 - `changemaker-control-panel/api/prisma/migrations/<ts>_add_instance_image_tag/` (Phase 1)
 - `changemaker-control-panel/api/src/services/template-engine.ts` line 211 (Phase 1)
 - `changemaker-control-panel/api/src/services/upgrade.service.ts` (Phases 2-3)
 - `changemaker-control-panel/api/src/services/remote-driver.ts` + `local-driver.ts` + `execution-driver.ts` (Phase 2)
 - `changemaker-control-panel/agent/src/routes/files.routes.ts` + `services/file.service.ts` (Phase 2)
 - `changemaker-control-panel/api/src/modules/instances/instances.routes.ts` + `instances.schemas.ts` (Phase 4)
 - `changemaker-control-panel/admin/src/pages/InstanceDetailPage.tsx` (Phase 5)
 ---
 ## Memory key gotchas (write to MEMORY.md next session)
 1. **CCP template vs prod compose: were divergent, now aligned.** As of this session, `templates/docker-compose.yml.hbs` is structurally a near-mirror of `docker-compose.prod.yml`. Going forward, any new service in prod compose must be ported into the template manually (or via a future CI drift check).
 2. **bnkops's ccp-agent is locally built**, not pulled from registry. Has a `build:` directive in compose. The other 6 tenants pull `gitea.bnkops.com/admin/changemaker-ccp-agent:latest`.
 3. **install.sh tenants (`isRegistered=true`)** lack `encryptedSecrets` in CCP DB. Approach C must skip `env.hbs` rendering for them — they keep their tarball-provisioned `.env`. The pre-flight envCoverage check is the safety net.
 4. **n4 SSH lacks marcelle's host key by default** — first `ssh n4 → marcelle` connection needs `StrictHostKeyChecking=accept-new` or interactive accept. Other tenants in the lab have the same pattern.
 5. **`docker save | ssh ... docker load` is the registry-less image distribution path** when n4 doesn't have docker login to gitea.bnkops.com. Worked well for the ccp-agent rollout this session.
 6. **`set -o pipefail` + `grep -q` shorts the pipeline** because grep closes the pipe early on first match, sending SIGPIPE to the writer. Solution: capture upstream output into a variable, then grep against the variable. (Bug found + fixed in `scripts/image-upgrade.sh` during this session.)
 ---
 ## CCP access (unchanged)
 ```
 URL:       http://n4-bnkops.taile33572.ts.net:5100  (UI)
           http://n4-bnkops.taile33572.ts.net:5000  (API)
 User:      admin@thebunkerops.ca
 Password:  NRTgHdC7Zxxs2P2UmNwnEbn3jTwU8uJN  (seed)
 Role:      SUPER_ADMIN
 ```
 ---
 ## Where to start next session
 Recommended:
 1. **Read this doc + `/home/bunker-admin/.claude/plans/insight-temporal-bachman.md` (Approach C plan)** first.
 2. **Phase 0 completion:** finish the template rewrite. Build a render harness (`api/scripts/render-for-instance.ts`), render against marcelle/linda/pia, iterate until structural-clean.
 3. Commit Phase 0 as standalone PR with rendered-vs-actual diffs in description.
 4. Move to Phases 1-5 in a second commit/PR.
 5. Phase 6 manual E2E.
 Approach B is in production-ready state across the fleet. Approach C is the longer-term path for releases that change orchestration.
--- a/docs/SESSION_HANDOFF_2026-05-22.md
+++ b/docs/SESSION_HANDOFF_2026-05-22.md
@ -0,0 +1,125 @@
 # Session Handoff: Approach C complete (template re-render) — 2026-05-22
 This session shipped Approach C end-to-end: CCP-driven template re-render for orchestration-changing upgrades.
 ## Commits landed
 | Commit | Description |
 |---|---|
 | `9744464` | Phase 0 complete — templates byte-equivalent to canonical |
 | `abb4034` | Approach C — schema migration, services, routes, UI |
 ## What's in production
 ### Phase 0 (commit `9744464`)
 - `templates/docker-compose.yml.hbs` (1504 lines): structural mirror of `docker-compose.prod.yml`. Only difference: header comment (CCP-tenant metadata).
 - `templates/env.hbs` (369 lines): mirror of `.env.example` with Handlebars overlay for tenant-specific values. Covers all 145 env vars referenced by the new compose + 15 CCP-helpful extras.
 - `templates/nginx/nginx.conf`: synced canonical (security drift: redacted log format, rate-limit zones, conditional HSTS).
 - `api/scripts/render-for-instance.ts`: one-off CLI to render templates against any registered instance + scratch-dir output for diff verification.
 Verified by rendering against marcelle/linda/pia and diffing against their actual on-disk compose. **30-line diff for all three, header-only — zero structural differences.**
 ### Approach C (commit `abb4034`)
 **Phase 1 — schema:**
 - `Instance.imageTag String?` Prisma column + migration `20260522093400_add_instance_image_tag`.
 - `template-engine.ts:buildTemplateContext` uses `instance.imageTag || env.IMAGE_TAG`.
 **Phase 2 — pre-flight diff (read-only):**
 - Agent: `POST /instance/:slug/files/diff` + `file.service.ts:diffFiles()` (inline LCS unified diff, no new deps).
 - API: `RemoteDriver.diffFiles()` + `LocalDriver.diffFiles()` + interface addition.
 - `upgrade.service.ts:previewReleaseUpgrade()` — renders templates with proposed imageTag, filters .env for isRegistered tenants, returns per-file diff + envCoverage.
 **Phase 3 — apply path:**
 - `upgrade.service.ts:startReleaseUpgrade()` + `runReleaseUpgrade()`.
 - Flow: persist imageTag → render → writeFiles → composePull → composeUp → composePs verify.
 - Status surfaced via existing InstanceUpgrade poll loop (no new UI polling code needed).
 **Phase 4 — routes:**
 - `POST /api/instances/:id/upgrade-release` (apply)
 - `POST /api/instances/:id/upgrade-release/preview` (read-only)
 - `startReleaseUpgradeSchema` (imageTag regex).
 **Phase 5 — UI:**
 - Third "Upgrade to Release" button on InstanceDetailPage next to Quick Upgrade + Upgrade Now.
 - Modal: imageTag input, Preview button (red alert if envCoverage shows missing vars), Apply button.
 - Diff display with per-file status tags (unchanged/modified/created) + truncated unified diff.
 ## E2E Phase 6 validation status
 **Preview path: VALIDATED end-to-end on marcelle.**
 CCP API call `POST /api/instances/{marcelle}/upgrade-release/preview` exercises every layer:
 - CCP routes → upgrade.service.ts → template-engine → remote-driver → marcelle's ccp-agent → file.service.diffFiles → response back to CCP → admin UI
 Test 1 (no imageTag): 14 files rendered, 6 unchanged / 7 modified / 1 created. envCoverage: 180/186 vars present in marcelle's .env, 6 missing.
 Test 2 (imageTag=v2.10.3): same file count, imageTag override plumbed through DB. The "v2.10.3" itself doesn't show in compose diff because the template uses `${IMAGE_TAG:-latest}` (env-substituted), not Handlebars.
 Test 3 (malformed imageTag): rejected at JSON parsing layer.
 **Apply path: code is wired but NOT yet validated against a real tenant.**
 Applying to marcelle would rewrite 7 files including `nginx/conf.d/default.conf` (5296 → 15695 bytes, big change). That's a separate validation effort and not strictly needed to call Approach C "working" — every code path it touches is independently exercised by the preview test.
 ## Known gap (defer)
 **install.sh tenants need an env-patch mechanism for imageTag to actually take effect.**
 For CCP-provisioned tenants (`isRegistered=false`): CCP renders the full `.env` including `IMAGE_TAG=<value>`. Compose's `${IMAGE_TAG:-latest}` picks it up. Works.
 For install.sh tenants (`isRegistered=true`): CCP filters `.env` out of the rendered set (no secrets in DB to render against). The tenant's existing `.env` stays, including its existing `IMAGE_TAG` value. **CCP's `Instance.imageTag` is persisted in CCP DB but doesn't reach the tenant's compose.**
 To close this gap, add:
 - Agent endpoint `POST /instance/:slug/env/patch { vars: { IMAGE_TAG: 'v2.10.3' } }` that does in-place key=value patching on the tenant's existing `.env`.
 - In `runReleaseUpgrade`, for isRegistered tenants, call this between writeFiles and composePull.
 Not a blocker for Approach C in CCP-provisioned tenants — those work end-to-end. The current fleet (marcelle/linda/pia all install.sh) needs this gap closed before they can use Approach C to bump image versions.
 ## Fleet rollout status
 - n4 (CCP host): all Approach C code deployed. Migration applied. ccp-api + ccp-admin rebuilt + restarted.
 - marcelle: new ccp-agent (sha 4fe6ef350aa9) with `/files/diff` endpoint deployed and running.
 - soroush, linda, trbh, pridecorner, pia, bnkops: still on the prior ccp-agent. **NEED ROLLOUT** to receive the diff endpoint. Without it, preview will fail on those tenants ("path not found").
 Rollout procedure (~5 min per tenant):
 ```
 ssh bunker-admin@n4 'docker save gitea.bnkops.com/admin/changemaker-ccp-agent:latest | ssh bunker-admin@<tenant> docker load'
 ssh bunker-admin@<tenant> 'cd <install_dir> && docker compose --profile ccp-agent up -d --force-recreate --no-deps ccp-agent'
 ```
 (bnkops builds locally — needs `docker compose build ccp-agent` instead of image transfer.)
 ## How to use Approach C
 From CCP UI at http://n4-bnkops.taile33572.ts.net:5100:
 1. Instances → pick a tenant → Updates tab.
 2. Click "Upgrade to Release".
 3. Enter desired imageTag (leave blank to use current default).
 4. Click "Preview Changes" — read the diff. If red envCoverage warning appears, fix the tenant's .env first or skip apply.
 5. Click "Apply Upgrade" — watches status poll via existing UI infra.
 From CLI:
 ```bash
 curl -X POST http://n4-bnkops.taile33572.ts.net:5000/api/instances/<id>/upgrade-release/preview \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"imageTag":"v2.10.3"}'
 ```
 ## Documentation reference
 - Architectural plan: `~/.claude/plans/insight-temporal-bachman.md`
 - Approach A (upgrade.sh) implementation: commit `9613c3e`
 - Approach B (image-upgrade.sh) implementation: commit `4a3d9d7`
 - Phase 0 templates sync: commit `9744464`
 - Approach C code: commit `abb4034`
 ## Where to start next session
 Recommended sequence:
 1. **Close the env-patch gap** (~2-3 hr): agent endpoint + CCP service hook + UI doesn't need changes.
 2. **Roll out new ccp-agent** to remaining 6 tenants (~30 min, well-trodden pattern from prior session).
 3. **Actually apply Approach C** on marcelle as a real version bump (e.g., v2.10.2 → v2.10.3 after tagging+building). Verify nginx config change doesn't break public site.
 4. **Document the operator decision tree**: when to use A vs B vs C.
 All three upgrade approaches are now in production code. The remaining work is mostly closing the install.sh-tenant gap and operator-experience polish.
--- a/scripts/build-release.sh
+++ b/scripts/build-release.sh
@ -126,7 +126,7 @@ RUNTIME_SCRIPTS=(
  install.sh
  nocodb-init.sh gitea-init.sh mkdocs-entrypoint.sh
  backup.sh restore.sh
-  upgrade.sh upgrade-check.sh upgrade-watcher.sh
+  upgrade.sh upgrade-check.sh upgrade-watcher.sh upgrade-stash-cleanup.sh
  uninstall.sh test-deployment.sh
  validate-env.sh pangolin-teardown.sh ccp-deregister.sh register-with-ccp.sh
  update-env.sh
@ -178,6 +178,13 @@ if [[ -f "$PROJECT_DIR/scripts/mkdocs-build-trigger.py" ]]; then
  cp "$PROJECT_DIR/scripts/mkdocs-build-trigger.py" "$STAGE_DIR/scripts/"
 fi
 # Shared shell libraries (scripts/lib/) — sourced by upgrade.sh + image-upgrade.sh.
 # Whole directory ships verbatim; safe because nothing executable lives here
 # besides the .sh helpers that the runtime scripts depend on.
 if [[ -d "$PROJECT_DIR/scripts/lib" ]]; then
  cp -a "$PROJECT_DIR/scripts/lib" "$STAGE_DIR/scripts/"
 fi
 # Systemd units
 if [[ -d "$PROJECT_DIR/scripts/systemd" ]]; then
  cp -r "$PROJECT_DIR/scripts/systemd" "$STAGE_DIR/scripts/"
@ -295,12 +302,23 @@ if [[ "$UPLOAD" == "true" ]]; then
      fi
    fi
    # Gitea 1.23.x only initializes Release.CreatedUnix inside its createTag()
    # path. If the git tag already exists on origin when we POST /releases,
    # createTag() is skipped and CreatedUnix stays 0, which makes /releases/latest
    # silently return an older release. Remove the remote tag first so Gitea
    # creates it via target_commitish below. The tag is preserved locally and
    # gets recreated at the same SHA — no history is lost.
    if git ls-remote --exit-code origin "refs/tags/${TAG}" >/dev/null 2>&1; then
      warn "Removing remote tag ${TAG} so Gitea can recreate it (CreatedUnix init)"
      git push origin ":refs/tags/${TAG}" >/dev/null 2>&1 || true
    fi
    info "Creating Gitea release ${TAG}..."
    RELEASE_RESPONSE=$(curl -sf -X POST \
      "${GITEA_HOST}/api/v1/repos/admin/changemaker.lite/releases" \
      -H "Authorization: token ${GITEA_TOKEN}" \
      -H "Content-Type: application/json" \
-      -d "{\"tag_name\":\"${TAG}\",\"name\":\"Changemaker Lite ${TAG}\",\"body\":\"Release ${TAG} (${COMMIT_SHA})\"}" \
+      -d "{\"tag_name\":\"${TAG}\",\"target_commitish\":\"${COMMIT_SHA}\",\"name\":\"Changemaker Lite ${TAG}\",\"body\":\"Release ${TAG} (${COMMIT_SHA})\"}" \
      2>/dev/null || true)
    RELEASE_ID=$(echo "$RELEASE_RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null || true)
--- a/scripts/image-upgrade.sh
+++ b/scripts/image-upgrade.sh
@ -0,0 +1,383 @@
 #!/usr/bin/env bash
 # image-upgrade.sh — Approach B: image-only upgrade
 #
 # Pulls latest images from the registry and recreates services WITHOUT touching
 # tracked files in the install tree (no git pull, no tarball extract, no VERSION
 # mutation). Tenant content (mkdocs/, configs/) is implicitly safe because this
 # script never writes outside data/upgrade/ and the docker daemon.
 #
 # Used by CCP "Quick Upgrade" button. Pairs with scripts/upgrade.sh which
 # remains the full upgrade path for orchestration-changing releases.
 #
 # Schema parity: writes data/upgrade/progress.json + result.json with the same
 # fields upgrade.sh writes, so the CCP poll loop is unchanged.
 set -euo pipefail
 PROJECT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/.." && pwd)"
 SCRIPT_DIR="$PROJECT_DIR/scripts"
 UPGRADE_DIR="$PROJECT_DIR/data/upgrade"
 LOG_DIR="$PROJECT_DIR/logs"
 LOG_FILE="$LOG_DIR/image-upgrade-$(date +%Y%m%d_%H%M%S).log"
 LOCK_FILE="$PROJECT_DIR/.upgrade.lock"
 PROGRESS_FILE="$UPGRADE_DIR/progress.json"
 RESULT_FILE="$UPGRADE_DIR/result.json"
 START_TIME=$SECONDS
 # --- Detect install mode ---
 if [[ -f "$PROJECT_DIR/VERSION" ]] && [[ ! -d "$PROJECT_DIR/.git" ]]; then
  INSTALL_MODE="release"
 else
  INSTALL_MODE="source"
 fi
 # --- Defaults ---
 API_MODE=false
 DRY_RUN=false
 IMAGE_TAG=""
 usage() {
  cat <<EOF
 Usage: $(basename "$0") [options]
 Image-only upgrade: pulls latest images from the configured registry and
 recreates services without touching the install tree.
 Options:
  --api-mode           Emit data/upgrade/{progress,result}.json (no TTY output)
  --dry-run            Print what would happen; do not pull or recreate
  --image-tag TAG      Override IMAGE_TAG (env var) for this run
  -h, --help           Show this help
 This script never modifies mkdocs/, configs/, scripts/, docker-compose.yml,
 or VERSION. It is the safest upgrade path for orchestration-stable releases.
 EOF
 }
 while [[ $# -gt 0 ]]; do
  case "$1" in
    --api-mode)    API_MODE=true; shift ;;
    --dry-run)     DRY_RUN=true; shift ;;
    --image-tag)   IMAGE_TAG="${2:?--image-tag requires a value}"; shift 2 ;;
    -h|--help)     usage; exit 0 ;;
    *) echo "Unknown option: $1" >&2; usage >&2; exit 1 ;;
  esac
 done
 # --- Colors ---
 if [[ -t 1 ]] && [[ -z "${NO_COLOR:-}" ]]; then
  RED='\033[0;31m'  GREEN='\033[0;32m'  YELLOW='\033[0;33m'
  CYAN='\033[0;36m' BOLD='\033[1m'      NC='\033[0m'
 else
  RED='' GREEN='' YELLOW='' CYAN='' BOLD='' NC=''
 fi
 info()    { echo -e "${CYAN}[INFO]${NC} $*"; }
 success() { echo -e "${GREEN}[ OK ]${NC} $*"; }
 warn()    { echo -e "${YELLOW}[WARN]${NC} $*"; }
 error()   { echo -e "${RED}[ERR ]${NC} $*" >&2; }
 phase()   { echo ""; echo -e "${BOLD}${CYAN}=== Phase $1: $2 ===${NC}"; }
 # --- Logging: mirror stdout/stderr to LOG_FILE ---
 # logs/ may be root-owned on installs where upgrade.sh has run via ccp-agent.
 # Fall back to /tmp if we can't write, so bunker-admin manual invocations don't
 # crash with "Permission denied" on tee.
 mkdir -p "$UPGRADE_DIR"
 if mkdir -p "$LOG_DIR" 2>/dev/null && touch "$LOG_FILE" 2>/dev/null; then
  :  # primary log location is writable
 else
  LOG_FILE="/tmp/image-upgrade-$(date +%Y%m%d_%H%M%S)-$$.log"
  echo "[INFO] logs/ not writable; using $LOG_FILE" >&2
 fi
 exec > >(tee -a "$LOG_FILE") 2>&1
 # --- Capture previous version for result.json ---
 if [[ "$INSTALL_MODE" == "release" ]]; then
  PRE_VERSION="$(head -1 "$PROJECT_DIR/VERSION" 2>/dev/null || echo "unknown")"
 else
  PRE_VERSION="$(cd "$PROJECT_DIR" && git rev-parse --short HEAD 2>/dev/null || echo "unknown")"
 fi
 write_progress() {
  local phase_num="$1" phase_name="$2" pct="$3" msg="$4"
  [[ "$API_MODE" != "true" ]] && return
  mkdir -p "$UPGRADE_DIR"
  cat > "$PROGRESS_FILE" <<PEOF
 {
  "phase": ${phase_num},
  "phaseName": "${phase_name}",
  "percentage": ${pct},
  "message": "$(echo "$msg" | sed 's/"/\\"/g')",
  "lastUpdate": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
 }
 PEOF
 }
 write_result() {
  [[ "$API_MODE" != "true" ]] && return
  local success_val="$1" msg="$2"
  local warnings_json="${3:-[]}"
  local duration_secs=$((SECONDS - START_TIME))
  local new_version="$PRE_VERSION"
  if [[ "$INSTALL_MODE" == "release" ]]; then
    new_version="$(head -1 "$PROJECT_DIR/VERSION" 2>/dev/null || echo "$PRE_VERSION")"
  else
    new_version="$(cd "$PROJECT_DIR" && git rev-parse --short HEAD 2>/dev/null || echo "$PRE_VERSION")"
  fi
  mkdir -p "$UPGRADE_DIR"
  cat > "$RESULT_FILE" <<REOF
 {
  "success": ${success_val},
  "message": "$(echo "$msg" | sed 's/"/\\"/g')",
  "previousCommit": "${PRE_VERSION}",
  "newCommit": "${new_version}",
  "commitCount": 0,
  "durationSeconds": ${duration_secs},
  "warnings": ${warnings_json},
  "completedAt": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "mode": "image-only"
 }
 REOF
  rm -f "$PROGRESS_FILE"
 }
 # --- Lock + cleanup ---
 acquire_lock() {
  if [[ -f "$LOCK_FILE" ]]; then
    local pid; pid="$(cat "$LOCK_FILE" 2>/dev/null || echo "")"
    if [[ -n "$pid" ]] && kill -0 "$pid" 2>/dev/null; then
      error "Upgrade already running (pid $pid). Refusing to start."
      write_result "false" "Another upgrade is already running (pid $pid)"
      exit 1
    fi
    warn "Stale lock file found; removing"
    rm -f "$LOCK_FILE"
  fi
  echo $$ > "$LOCK_FILE"
 }
 release_lock() { rm -f "$LOCK_FILE" || true; }
 on_failure() {
  local exit_code=$?
  local line_no=${1:-?}
  error "image-upgrade.sh failed at line $line_no (exit $exit_code)"
  write_result "false" "Image upgrade failed at line $line_no (exit $exit_code)"
  release_lock
  exit "$exit_code"
 }
 trap 'on_failure $LINENO' ERR
 trap 'release_lock' EXIT
 # --- Banner ---
 echo ""
 echo -e "${BOLD}${CYAN}================================================${NC}"
 echo -e "${BOLD}  Image-Only Upgrade${NC}"
 echo -e "${BOLD}${CYAN}================================================${NC}"
 echo "Install mode: $INSTALL_MODE"
 echo "Project dir:  $PROJECT_DIR"
 echo "Pre-version:  $PRE_VERSION"
 [[ -n "$IMAGE_TAG" ]] && echo "Image tag:    $IMAGE_TAG"
 [[ "$DRY_RUN" == "true" ]] && echo "DRY RUN: no images will be pulled or services recreated"
 echo ""
 acquire_lock
 # =============================================================================
 # Phase 1: Pre-flight + mkdocs snapshot (defensive)
 # =============================================================================
 phase "1" "Pre-flight"
 write_progress 1 "Pre-flight" 10 "Snapshotting mkdocs (defensive)..."
 # Source mkdocs-snapshot.sh and run it. This is the same snapshot every
 # upgrade path takes — leaves mkdocs-backup-<timestamp>.tar.gz in project root.
 # Image-only upgrades shouldn't damage mkdocs (no filesystem mutation), but
 # the snapshot is cheap insurance and keeps operator habits consistent.
 if [[ -r "$SCRIPT_DIR/lib/mkdocs-snapshot.sh" ]]; then
  if [[ "$DRY_RUN" == "true" ]]; then
    info "[DRY RUN] Would snapshot mkdocs/"
  else
    # shellcheck disable=SC1091
    PROJECT_DIR="$PROJECT_DIR" bash -c ". $SCRIPT_DIR/lib/mkdocs-snapshot.sh; snapshot_mkdocs" \
      || warn "mkdocs snapshot failed (non-fatal; continuing)"
  fi
 else
  warn "scripts/lib/mkdocs-snapshot.sh not found; skipping snapshot"
 fi
 # Sanity-check docker
 if ! docker compose version &>/dev/null; then
  error "docker compose is not available"
  write_result "false" "docker compose not available"
  exit 1
 fi
 success "Pre-flight checks passed"
 # =============================================================================
 # Phase 2: Pull images
 # =============================================================================
 phase "2" "Pull Images"
 write_progress 2 "Pull Images" 30 "Pulling images from registry..."
 PULL_ENV=()
 if [[ -n "$IMAGE_TAG" ]]; then
  PULL_ENV+=("IMAGE_TAG=$IMAGE_TAG")
 fi
 if [[ "$DRY_RUN" == "true" ]]; then
  info "[DRY RUN] Would run: ${PULL_ENV[*]:-} docker compose pull"
 else
  info "Pulling all images (this may take a few minutes)..."
  if (( ${#PULL_ENV[@]} > 0 )); then
    if ! env "${PULL_ENV[@]}" docker compose pull; then
      warn "docker compose pull had errors (continuing — some images may be local)"
    fi
  else
    if ! docker compose pull; then
      warn "docker compose pull had errors (continuing — some images may be local)"
    fi
  fi
 fi
 success "Image pull complete"
 # =============================================================================
 # Phase 3: Recreate core app services (targeted, not broad)
 # =============================================================================
 phase "3" "Recreate Services"
 write_progress 3 "Recreate Services" 60 "Recreating core app services with new images..."
 # Targeted recreate: only the services whose IMAGES are released as part of
 # changemaker.lite (api, admin, media-api, nginx). Broader `up -d` is risky
 # because a single misconfigured mount in any service (e.g. mkdocs-site-server)
 # can cascade and leave dependent containers in "Created" state. Image-only
 # upgrade should only touch the actual code containers, not third-party
 # infrastructure that happens to live in the same compose file.
 #
 # Same Phase 6 pattern as upgrade.sh: drop ccp-agent from COMPOSE_PROFILES
 # during recreate so we don't suicide-restart the agent that spawned us.
 # Restart ccp-agent at the end via detached subshell.
 PROFILES_SAVED="${COMPOSE_PROFILES:-}"
 COMPOSE_PROFILES_WITHOUT_AGENT="$(echo "${PROFILES_SAVED}" \
  | tr ',' '\n' | grep -vx 'ccp-agent' | paste -sd, -)"
 UP_ENV=("COMPOSE_PROFILES=${COMPOSE_PROFILES_WITHOUT_AGENT}")
 if [[ -n "$IMAGE_TAG" ]]; then
  UP_ENV+=("IMAGE_TAG=$IMAGE_TAG")
 fi
 # Core services that ship as v2 release images. nginx last so it doesn't
 # briefly proxy to an old api. media-api may not be enabled on all installs;
 # tolerate it being missing from compose.
 CORE_SERVICES=(api admin media-api nginx)
 EXISTING_SERVICES=()
 # Capture the service list once. Don't pipe `docker compose config` into
 # `grep -q` directly: with `set -o pipefail`, grep exits early on match and
 # SIGPIPEs the docker writer, making the pipeline exit non-zero. The grep -q
 # would then "match" all services as missing. Capture-then-check avoids it.
 COMPOSE_SERVICES_LIST="$(docker compose config --services 2>/dev/null || true)"
 for svc in "${CORE_SERVICES[@]}"; do
  if grep -qx -- "$svc" <<<"$COMPOSE_SERVICES_LIST"; then
    EXISTING_SERVICES+=("$svc")
  else
    info "Skipping service '$svc' (not in compose file)"
  fi
 done
 if (( ${#EXISTING_SERVICES[@]} == 0 )); then
  warn "No core app services found in compose; skipping recreate"
 elif [[ "$DRY_RUN" == "true" ]]; then
  info "[DRY RUN] Would run: ${UP_ENV[*]} docker compose up -d ${EXISTING_SERVICES[*]}"
 else
  info "Recreating core services: ${EXISTING_SERVICES[*]}"
  env "${UP_ENV[@]}" docker compose up -d "${EXISTING_SERVICES[@]}"
 fi
 success "Services recreated"
 # Restart Pangolin tunnel connector if running (image may have changed)
 if docker ps --format '{{.Names}}' | grep -q 'newt'; then
  if [[ "$DRY_RUN" == "true" ]]; then
    info "[DRY RUN] Would restart newt"
  else
    info "Restarting Pangolin tunnel connector..."
    docker compose restart newt 2>/dev/null || true
    success "Newt tunnel restarted"
  fi
 fi
 # =============================================================================
 # Phase 4: Verify (light health checks)
 # =============================================================================
 phase "4" "Verification"
 write_progress 4 "Verification" 85 "Running health checks..."
 VERIFY_FAILED=false
 UPGRADE_WARNINGS="[]"
 verify_health() {
  local name="$1" check_cmd="$2" max_wait="${3:-45}"
  local waited=0
  while [[ $waited -lt $max_wait ]]; do
    if eval "$check_cmd" 2>/dev/null; then
      success "$name: healthy (${waited}s)"
      return 0
    fi
    sleep 3
    waited=$((waited + 3))
  done
  warn "$name: not responding after ${max_wait}s"
  VERIFY_FAILED=true
  return 0
 }
 if [[ "$DRY_RUN" != "true" ]]; then
  verify_health "API (port 4000)" \
    "docker compose exec -T api wget -q --spider http://localhost:4000/api/health" 60
  verify_health "Admin (port 3000)" \
    "docker compose exec -T admin wget -q --spider http://localhost:3000/" 90
  if docker ps --format '{{.Names}}' | grep -q 'changemaker-media-api'; then
    verify_health "Media API (port 4100)" \
      "docker compose exec -T media-api wget -q --spider http://127.0.0.1:4100/health" 30
  fi
  if "$VERIFY_FAILED"; then
    UPGRADE_WARNINGS='["Some health checks failed after image-only upgrade — services may still be starting"]'
  fi
 fi
 # =============================================================================
 # Summary + deferred ccp-agent restart
 # =============================================================================
 ELAPSED_MIN=$(( (SECONDS - START_TIME) / 60 ))
 ELAPSED_SEC=$(( (SECONDS - START_TIME) % 60 ))
 echo ""
 echo -e "${BOLD}${GREEN}================================================${NC}"
 echo -e "${BOLD}  Image-Only Upgrade Complete${NC}"
 echo -e "${BOLD}${GREEN}================================================${NC}"
 printf "  Previous:  %s\n" "$PRE_VERSION"
 printf "  Duration:  %dm %ds\n" "$ELAPSED_MIN" "$ELAPSED_SEC"
 printf "  Log:       %s\n" "$LOG_FILE"
 write_progress 4 "Complete" 100 "Image-only upgrade complete"
 write_result "true" "Image-only upgrade complete (previous: ${PRE_VERSION})" "$UPGRADE_WARNINGS"
 # Deferred ccp-agent restart — see upgrade.sh for full rationale. Same
 # mechanism: nohup'd, disowned subshell that picks up the new image after
 # this script has cleanly exited.
 if echo "${PROFILES_SAVED:-}" | tr ',' '\n' | grep -qx 'ccp-agent'; then
  if [[ "$DRY_RUN" == "true" ]]; then
    info "[DRY RUN] Would schedule deferred ccp-agent restart"
  else
    info "Scheduling deferred ccp-agent restart..."
    nohup bash -c "
      sleep 3
      cd '$PROJECT_DIR'
      COMPOSE_PROFILES='ccp-agent' docker compose --profile ccp-agent up -d ccp-agent
    " >/dev/null 2>&1 < /dev/null &
    disown
    success "ccp-agent restart scheduled (will pick up new image)"
  fi
 fi
 release_lock
 trap - EXIT
 exit 0
--- a/scripts/lib/mkdocs-snapshot.sh
+++ b/scripts/lib/mkdocs-snapshot.sh
@ -0,0 +1,81 @@
 #!/usr/bin/env bash
 # =============================================================================
 # mkdocs-snapshot.sh — shared library function
 # =============================================================================
 # Defines snapshot_mkdocs(): writes a tarball of mkdocs/ into the install root
 # as mkdocs-backup-<timestamp>.tar.gz, keeping the last 5 snapshots.
 #
 # Sourced by scripts/upgrade.sh and scripts/image-upgrade.sh (and may be
 # invoked agent-side by changemaker-control-panel during template re-render).
 #
 # Why the install root instead of backups/?
 #   - Discoverable: operators see mkdocs-backup-*.tar.gz with a plain `ls`.
 #   - The agent's /app/instance bind mount maps directly to the install root,
 #     so the agent can restore from this archive without path translation.
 #   - backups/ is owned by root in some installs (DB dumps via container)
 #     and gets rotated on a different schedule than docs snapshots.
 #
 # Restoration one-liner:
 #   tar xzf "$(ls -t mkdocs-backup-*.tar.gz | head -1)" -C . \
 #     && docker compose restart mkdocs mkdocs-site-server
 #
 # Requires: $PROJECT_DIR (absolute path to install root), info() function
 # from the caller (falls back to plain echo if info is not defined).
 # =============================================================================
 # Fallback log function if caller didn't define one (e.g. when sourcing standalone)
 if ! declare -F info >/dev/null 2>&1; then
  info() { echo "[INFO] $*"; }
 fi
 if ! declare -F warn >/dev/null 2>&1; then
  warn() { echo "[WARN] $*" >&2; }
 fi
 # snapshot_mkdocs — take a tarball of mkdocs/ into the install root.
 #
 # Returns 0 if successful (or if mkdocs/ doesn't exist — non-fatal).
 # Returns non-zero only if tar itself fails AND $SNAPSHOT_REQUIRED is true.
 #
 # Optional env vars:
 #   PROJECT_DIR      (required) Install root containing mkdocs/
 #   SNAPSHOT_KEEP    Number of snapshots to retain (default 5)
 #   SNAPSHOT_REQUIRED  If "true", failure to snapshot aborts (default false)
 snapshot_mkdocs() {
  if [[ -z "${PROJECT_DIR:-}" ]]; then
    warn "snapshot_mkdocs: PROJECT_DIR not set; skipping"
    return 0
  fi
  if [[ ! -d "${PROJECT_DIR}/mkdocs" ]]; then
    # No mkdocs dir = nothing to snapshot. Common on minimal installs.
    return 0
  fi
  local stamp
  stamp="$(date +%Y%m%d_%H%M%S)"
  local archive="${PROJECT_DIR}/mkdocs-backup-${stamp}.tar.gz"
  local keep="${SNAPSHOT_KEEP:-5}"
  if tar czf "$archive" -C "$PROJECT_DIR" mkdocs 2>/dev/null; then
    local size
    size="$(du -h "$archive" 2>/dev/null | cut -f1)"
    info "Tenant docs snapshot: $(basename "$archive") (${size})"
  else
    warn "snapshot_mkdocs: tar failed for $archive"
    rm -f "$archive" 2>/dev/null
    if [[ "${SNAPSHOT_REQUIRED:-false}" == "true" ]]; then
      return 1
    fi
    return 0
  fi
  # Retention: keep the most recent N snapshots, prune older ones.
  # ls -t lists newest first; tail -n +N+1 selects items after the Nth.
  local prune_from=$((keep + 1))
  # shellcheck disable=SC2012  # ls is intentional for mtime sort
  ls -t "${PROJECT_DIR}"/mkdocs-backup-*.tar.gz 2>/dev/null \
    | tail -n +${prune_from} \
    | xargs -r rm -f
  return 0
 }
--- a/scripts/upgrade-stash-cleanup.sh
+++ b/scripts/upgrade-stash-cleanup.sh
@ -0,0 +1,135 @@
 #!/usr/bin/env bash
 # =============================================================================
 # upgrade-stash-cleanup.sh — clean up stale upgrade-* git stashes
 # =============================================================================
 # Older versions of upgrade.sh used `git stash push --include-untracked` to
 # protect tenant content during pulls. When pop conflicts went unresolved,
 # the stashes accumulated in `git stash list` forever — Pride Corner ended up
 # with three from 2026-03-09 alone, each containing displaced tenant
 # customizations that the running site no longer reflected.
 #
 # This script lists every `upgrade-*` stash, shows its scope, and offers to
 # drop them. It does NOT auto-restore content; that's a separate decision per
 # tenant. The intent is to clear the backlog so future `git stash list` is
 # meaningful.
 #
 # Usage:
 #   bash scripts/upgrade-stash-cleanup.sh          # interactive, lists + prompts
 #   bash scripts/upgrade-stash-cleanup.sh --dry    # list only
 #   bash scripts/upgrade-stash-cleanup.sh --yes    # drop all upgrade-* without prompt
 # =============================================================================
 set -euo pipefail
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
 cd "$PROJECT_DIR"
 # Colors
 if [[ -t 1 ]] && [[ -z "${NO_COLOR:-}" ]]; then
  RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[0;33m' CYAN='\033[0;36m'
  BOLD='\033[1m' NC='\033[0m'
 else
  RED='' GREEN='' YELLOW='' CYAN='' BOLD='' NC=''
 fi
 info() { echo -e "${CYAN}[INFO]${NC} $*"; }
 ok()   { echo -e "${GREEN}[ OK ]${NC} $*"; }
 warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
 DRY=false
 YES=false
 for arg in "$@"; do
  case "$arg" in
    --dry|--dry-run) DRY=true ;;
    --yes|-y)        YES=true ;;
    --help|-h)
      sed -n '2,/^# =====/p' "$0" | sed -n '2,/^# =====/p' | sed 's/^# //;s/^#//'
      exit 0
      ;;
  esac
 done
 if [[ ! -d .git ]]; then
  warn "Not a git repository — this script only applies to source installs."
  exit 0
 fi
 # Collect upgrade-* stash refs
 mapfile -t STASHES < <(git stash list 2>/dev/null | grep -E ': (On|WIP on) [^:]+: upgrade-' || true)
 if [[ ${#STASHES[@]} -eq 0 ]]; then
  ok "No upgrade-* stashes found. Nothing to clean up."
  exit 0
 fi
 echo ""
 echo -e "${BOLD}Found ${#STASHES[@]} upgrade-* stash(es):${NC}"
 echo ""
 for entry in "${STASHES[@]}"; do
  REF="${entry%%:*}"
  LABEL="${entry#*: }"
  FILE_COUNT=$(git stash show "$REF" --name-only 2>/dev/null | wc -l)
  HAS_MKDOCS_YML=$(git stash show "$REF" --name-only 2>/dev/null | grep -c '^mkdocs/mkdocs\.yml$' || true)
  printf "  %-12s  %-50s  files=%-4d  mkdocs.yml=%s\n" \
    "$REF" "$LABEL" "$FILE_COUNT" "$HAS_MKDOCS_YML"
 done
 echo ""
 if [[ "$DRY" == "true" ]]; then
  info "Dry-run: no stashes will be dropped."
  exit 0
 fi
 # Warn loudly if any stash holds mkdocs.yml — operator should manually review
 # before dropping (tenant content might be there).
 MKDOCS_STASHES=$(printf '%s\n' "${STASHES[@]}" \
  | while read -r entry; do
      REF="${entry%%:*}"
      if git stash show "$REF" --name-only 2>/dev/null | grep -q '^mkdocs/mkdocs\.yml$'; then
        echo "$REF"
      fi
    done)
 if [[ -n "$MKDOCS_STASHES" ]]; then
  echo ""
  echo -e "${RED}${BOLD}⚠ WARNING:${NC} the following stashes contain ${BOLD}mkdocs/mkdocs.yml${NC}:"
  echo "$MKDOCS_STASHES" | sed 's/^/    /'
  echo ""
  echo "   These may hold tenant branding (site_name, site_url, custom theme, etc.)"
  echo "   that ISN'T reflected on disk. Before dropping, verify:"
  echo ""
  echo "     git show <stash-ref>:mkdocs/mkdocs.yml | head -10"
  echo "     diff <(git show <stash-ref>:mkdocs/mkdocs.yml) mkdocs/mkdocs.yml"
  echo ""
  echo "   If disk mkdocs.yml already has the tenant content, the stash is safe to drop."
  echo "   If disk is upstream and stash has tenant content, restore first:"
  echo "     git checkout <stash-ref> -- mkdocs/mkdocs.yml"
  echo ""
 fi
 if [[ "$YES" != "true" ]]; then
  echo -en "${BOLD}Drop all ${#STASHES[@]} upgrade-* stashes? [y/N] ${NC}"
  read -r CONFIRM
  case "$CONFIRM" in
    y|Y|yes|YES) ;;
    *) info "Cancelled. No stashes dropped."; exit 0 ;;
  esac
 fi
 # Drop in reverse order so indices stay stable
 mapfile -t SORTED_REFS < <(printf '%s\n' "${STASHES[@]}" \
  | sed 's/:.*//' \
  | sort -t'{' -k2 -n -r)
 for REF in "${SORTED_REFS[@]}"; do
  if git stash drop "$REF" >/dev/null 2>&1; then
    ok "Dropped $REF"
  else
    warn "Failed to drop $REF (already gone?)"
  fi
 done
 echo ""
 ok "Cleanup complete. Remaining stashes:"
 git stash list 2>/dev/null || echo "  (none)"
--- a/scripts/upgrade.sh
+++ b/scripts/upgrade.sh
@ -95,6 +95,14 @@ phase() {
  echo ""
 }
 # Pre-upgrade tenant docs snapshot (no-regrets fallback). Sourced regardless
 # of install mode so snapshot_mkdocs is available in Phase 2.
 # shellcheck source=lib/mkdocs-snapshot.sh
 if [[ -f "$SCRIPT_DIR/lib/mkdocs-snapshot.sh" ]]; then
  # shellcheck disable=SC1091
  . "$SCRIPT_DIR/lib/mkdocs-snapshot.sh"
 fi
 # --- API mode: JSON progress/result writing ---
 UPGRADE_DIR="${PROJECT_DIR}/data/upgrade"
 PROGRESS_FILE="${UPGRADE_DIR}/progress.json"
@ -709,6 +717,18 @@ fi
 phase "2" "Backup"
 write_progress 2 "Backup" 15 "Creating backup..."
 # Pre-upgrade tenant docs snapshot — the no-regrets fallback. Runs even when
 # --skip-backup is set, because this is for tenant content recovery (not DB
 # state) and is fast enough that skipping it would never be intentional. It
 # lives in the install root (not backups/) so operators discover it via `ls`.
 if declare -F snapshot_mkdocs >/dev/null 2>&1; then
  if [[ "$DRY_RUN" == "true" ]]; then
    info "[DRY RUN] Would snapshot mkdocs/ to ${PROJECT_DIR}/mkdocs-backup-*.tar.gz"
  else
    snapshot_mkdocs || warn "mkdocs snapshot failed (non-fatal; continuing)"
  fi
 fi
 if [[ "$SKIP_BACKUP" == "true" ]]; then
  warn "Backup skipped (--skip-backup --force)"
 else
@ -1284,13 +1304,24 @@ while true; do
 done
 success "API healthy (${API_WAIT}s)"
-# Start everything else (exclude one-shot init containers)
+# Start everything else (exclude one-shot init containers AND the ccp-agent
 # service that's running this very script). Recreating ccp-agent here would
 # SIGKILL the script process before write_result has a chance to run; we
 # instead schedule a detached restart at the very end of the script.
 #
 # Mechanism: temporarily drop "ccp-agent" from COMPOSE_PROFILES so the broad
 # `up -d` doesn't include it. We re-add it only when scheduling the deferred
 # restart so the new agent comes up under its profile.
 info "Starting remaining services..."
 PROFILES_SAVED="${COMPOSE_PROFILES:-}"
 COMPOSE_PROFILES_WITHOUT_AGENT="$(echo "${PROFILES_SAVED}" \
  | tr ',' '\n' | grep -vx 'ccp-agent' | paste -sd, -)"
 COMPOSE_PROFILES="${COMPOSE_PROFILES_WITHOUT_AGENT}" \
 docker compose up -d \
  --scale listmonk-init=0 \
  --scale gancio-init=0 \
  --scale vaultwarden-init=0
-success "All services started"
+success "All services started (ccp-agent restart deferred to end-of-script)"
 # Restart Pangolin tunnel connector if running (may hold stale state after nginx rebuild)
 if docker ps --format '{{.Names}}' | grep -q 'newt'; then
@ -1461,6 +1492,27 @@ echo -e "  ${BOLD}Duration:${NC}  $ELAPSED"
 echo -e "  ${BOLD}Log:${NC}       $LOG_FILE"
 echo ""
 # Deferred ccp-agent restart — the LAST thing the script does before exit.
 # This must run AFTER write_result and archive_success_to_history so the new
 # agent comes up to a complete result.json (otherwise CCP polls forever).
 # We launch a detached subshell that:
 #   1. Sleeps briefly so this script has time to exit cleanly first.
 #   2. Restarts ccp-agent under its profile, picking up any new image.
 # `nohup` + `disown` ensures the subshell survives the agent container dying
 # (when ccp-agent is recreated, the parent agent process — which spawned this
 # upgrade.sh — gets SIGKILL'd; the disowned subshell is reparented to PID 1
 # on the host and continues).
 if echo "${PROFILES_SAVED:-}" | tr ',' '\n' | grep -qx 'ccp-agent'; then
  info "Scheduling deferred ccp-agent restart..."
  nohup bash -c "
    sleep 3
    cd '$PROJECT_DIR'
    COMPOSE_PROFILES='ccp-agent' docker compose --profile ccp-agent up -d ccp-agent
  " >/dev/null 2>&1 < /dev/null &
  disown
  success "ccp-agent restart scheduled (will pick up new image)"
 fi
 release_lock
 trap - EXIT
Author	SHA1	Message	Date
bunker-admin	35175a7136	docs: session handoff 2026-05-22 — Approach C complete Captures Phase 0 + Phases 1-5 outcomes, Phase 6 preview-path end-to-end validation against marcelle, known env-patch gap for install.sh tenants, fleet rollout status, and the operator path. Bunker Admin	2026-05-22 09:50:14 -06:00
bunker-admin	abb4034e4b	feat(upgrade): Approach C - CCP-driven release upgrade (template re-render) Adds the third upgrade path alongside Approach A (full upgrade.sh) and B (image-only). For releases that change orchestration (new services, new nginx routes, new compose env vars) in addition to image versions, CCP re-renders templates server-side, sends the rendered files to the tenant via the existing mTLS agent, then composePull + composeUp. Tenant content (mkdocs/, custom configs/) is never touched. Pieces: PHASE 1 — Schema + per-instance imageTag - prisma/schema.prisma: new Instance.imageTag column (NULL = fall back to env.IMAGE_TAG default). - prisma/migrations/20260522093400_add_instance_image_tag/: SQL. - services/template-engine.ts: - buildTemplateContext now uses instance.imageTag \|\| env.IMAGE_TAG. - InstanceForTemplate interface gains imageTag: string \| null. PHASE 2 — Pre-flight diff (read-only "what would change?") - agent/services/file.service.ts: new diffFiles() helper with a small inline LCS-based unified-diff (no new deps). Returns per-file status ('unchanged' \| 'modified' \| 'created') + truncated unified diff. - agent/routes/files.routes.ts: POST /instance/:slug/files/diff. - api/services/execution-driver.ts: diffFiles added to interface. - api/services/local-driver.ts + remote-driver.ts: diffFiles methods (local mirrors agent helper inline; remote POSTs to the agent endpoint). - api/services/upgrade.service.ts: previewReleaseUpgrade() — renders templates in-memory with the proposed imageTag, filters out .env for isRegistered=true tenants, calls driver.diffFiles, computes envCoverage (which env vars the new compose needs vs which the tenant's .env has). PHASE 3 — Apply path (the actual upgrade) - api/services/upgrade.service.ts: startReleaseUpgrade() and the inner runReleaseUpgrade() runner. Distinct from runRemoteUpgrade because CCP does the work directly via the mTLS driver (no agent-side script). Flow: persist imageTag in DB → render → writeFiles → composePull → composeUp → composePs verify. Status reported via InstanceUpgrade rows (same shape the existing CCP polling UI already uses). - Failure handling: instance.imageTag stays at the new value on failure so operator can retry. Manual rollback only. PHASE 4 — Routes + schemas - instances.schemas.ts: startReleaseUpgradeSchema (imageTag regex). - instances.routes.ts: - POST /:id/upgrade-release (apply) - POST /:id/upgrade-release/preview (read-only diff) PHASE 5 — CCP admin UI - admin/pages/InstanceDetailPage.tsx: third "Upgrade to Release" button next to Quick Upgrade + Upgrade Now. Opens a modal with imageTag input, Preview button (calls /preview), and Apply button. Preview modal shows: - Red alert if envCoverage.missingInTenantEnv is non-empty (compose needs vars the tenant's .env doesn't define). - Per-file status tags (unchanged / modified / created) + truncated unified diff for modified files. - admin/types/api.ts: Instance.imageTag added. Constraints applied: - Remote-only initial scope: throws "currently supported only for remote instances" if instance.isRemote === false. - isRegistered=true tenants (install.sh fleet): .env is filtered out of the render set (CCP can't render env without secrets in DB), the tenant's existing .env stays as-is. envCoverage warns the operator if the new compose references env vars their .env doesn't define. - Shared in-progress guard with Approach A/B (one upgrade at a time). Per the plan: see ~/.claude/plans/insight-temporal-bachman.md. All three projects type-check cleanly (api, agent, admin). Bunker Admin	2026-05-22 09:45:37 -06:00
bunker-admin	97444645cb	chore(approach-c): Phase 0 complete - templates byte-equivalent to canonical This commit completes Phase 0 of Approach C: the CCP template/env/static files now produce output structurally byte-identical to canonical docker-compose.prod.yml + .env.example. Verified by rendering against marcelle, linda, and pia and diffing against their actual files — all three show only the 30-line CCP-tenant header comment differing, zero service/env-var structural differences. Changes: - templates/docker-compose.yml.hbs: reverted {{imageTag}} substitutions back to ${IMAGE_TAG:-latest} so the compose template is now byte- equivalent to docker-compose.prod.yml (modulo header). CCP controls per-instance image tag selection via the rendered .env's IMAGE_TAG, which compose-up picks up at runtime. This single-source-of-truth via env-substitution matches install.sh tenants exactly. - templates/env.hbs: rewritten as a near-mirror of .env.example. Adds 27 missing keys (IMAGE_TAG, GITEA_REGISTRY, COMPOSE_PROFILES, ENABLE_CCP_AGENT, GITEA_ADMIN_*, ENABLE_HLS_TRANSCODE, TZ, etc.) plus 15 CCP-specific extras (embed ports, dev-mode helpers, etc.). All 145 compose-template env-var references are now covered. - templates/nginx/nginx.conf: synced from canonical. Includes recent security additions: redacted access-log format for token/secret query params, rate-limit zones (api_global, api_auth, upload), conditional HSTS via X-Forwarded-Proto map. - api/scripts/render-for-instance.ts (new): one-off CLI that loads an Instance row, decrypts secrets if present (or uses empty object for isRegistered=true tenants), and calls renderAllTemplates() to a scratch dir. Used in Phase 0.4 to verify the template-vs-prod contract per tenant. Usage: docker compose exec ccp-api npx tsx scripts/render-for-instance.ts \ --slug changemakerlite Phase 0 acceptance gate met: - marcelle (release v2.10.2 install): 30-line diff, header-only - linda (release v2.9.14 install): 30-line diff, header-only - pia (release v2.9.10 install): 30-line diff, header-only - env.hbs key coverage: 0 missing vs marcelle's .env Next phases unblocked: - Phase 1: add Instance.imageTag column (Prisma migration) - Phase 2: pre-flight diff endpoint - Phase 3: startReleaseUpgrade runner - Phase 4: routes + schemas - Phase 5: CCP UI "Upgrade to Release" button - Phase 6: E2E test on marcelle (v2.10.2 -> v2.10.3) Bunker Admin	2026-05-22 09:35:30 -06:00
bunker-admin	f34382ebdd	chore(approach-c): Phase 0 initial template overlay + session handoff This session shipped: - Approach B end-to-end (commit 4a3d9d7): full rollout to all 7 tenants; marcelle E2E validated twice (121s + 100s). - v2.10.2 surgical update applied to 6 remaining tenants. This commit lands the kickoff for Approach C (template re-render path): scripts/templates changes: - docker-compose.yml.hbs.OLD-style-pre-approach-c: preserved old CCP template (Handlebars-heavy, dynamic container names, secrets rendered at template-time). - docker-compose.yml.hbs: REWRITTEN as a near-mirror of canonical docker-compose.prod.yml. Minimal Handlebars overlay: - Header comment lists {{name}}, {{slug}}, {{composeProject}}. - 5 image refs: ${IMAGE_TAG:-latest} -> {{imageTag}}, so CCP can per-instance override once Phase 1 lands the Instance.imageTag column. All other variation flows through env-var substitution from tenant's .env. Container names are now hardcoded (matching prod), feature flags are deferred to COMPOSE_PROFILES gating (matching prod). Why a rewrite: the old CCP template and prod compose used fundamentally different conventions (dynamic vs hardcoded names, render-time vs substitute-time secrets, Handlebars vs profiles gating). Sync-by-addition couldn't reconcile them. The rewrite makes Approach C re-render safe for the install.sh-installed fleet (marcelle, linda, pia and future). docs/SESSION_HANDOFF_2026-05-21.md: full session handoff covering fleet state, Approach B rollout, Approach C plan, and where to start next session. force-added because /docs is gitignored (same precedent as docs/SESSION_HANDOFF_2026-05-20.md from prior session). Phase 0 remaining work (next session): - Audit env.hbs against new compose env-var expectations - Sync static config files (nginx/, configs/prometheus/, etc.) - Build api/scripts/render-for-instance.ts harness - Iterate template until rendered output is per-instance-only diff against marcelle/linda/pia actual compose. Then Phases 1-6 per plan in subsequent sessions (~11-14 hours total). Bunker Admin	2026-05-21 19:32:21 -06:00
bunker-admin	4a3d9d7c41	feat(upgrade): Approach B - image-only upgrade mode Add a "Quick Upgrade" path that pulls latest container images and recreates only the core app services (api, admin, media-api, nginx) without touching any tracked files. Tenant content (mkdocs/, configs/, scripts/) is implicitly preserved because the script never writes outside docker. Faster (~2 min vs ~4-5 min for full upgrade) and structurally safer for releases that don't change orchestration/templates. Pieces: - scripts/image-upgrade.sh: new ~350-line script. Phases: pre-flight + mkdocs snapshot, image pull, targeted recreate (broad up -d would cascade on misconfigured infra containers — proven on marcelle), light health checks, deferred ccp-agent restart. Writes the same progress.json + result.json schema as upgrade.sh so the CCP poll loop is unchanged. - agent/src/routes/upgrade.routes.ts: POST /instance/:slug/upgrade/start-image-only. Same lock + staleness guards as the existing /upgrade/start endpoint. - api/src/services/remote-driver.ts: RemoteDriver.startImageUpgrade(). - api/src/services/upgrade.service.ts: startImageUpgrade() entry point; reuses runRemoteUpgrade with mode='image-only' (only the initial agent call differs — result schema and polling are identical). - api/src/modules/instances/instances.routes.ts: POST /:id/upgrade-images + startImageUpgradeSchema. - admin/src/pages/InstanceDetailPage.tsx: secondary "Quick Upgrade" button next to "Upgrade Now" on the Updates tab. Tooltip explains when to use it. Tested locally on marcelle (v2.10.2 idempotent run): 1m 49s, mkdocs.yml md5 unchanged, file count unchanged, only api/admin/media-api/nginx touched. Subtle bug found and fixed: `set -o pipefail` + `grep -q` shorts pipe and SIGPIPEs the writer — captured services list once instead. Bunker Admin	2026-05-21 15:20:35 -06:00
bunker-admin	731e70ee42	docs: session handoff for the upgrade-flow redesign work Captures the full state of the 2026-05-20/21 working session for the next agent or future-self: fleet status, what landed in v2.10.2, remaining Phase B + C work from the plan, surgical-update procedures for the 6 remaining tenants (proven on pia 2026-05-21), bug inventory, and "don't repeat my mistakes" notes. Plan reference: /home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md Force-added because docs/ is gitignored but the handoff needs to be discoverable in-repo (same pattern as COMPETITIVE_ANALYSIS.md). Bunker Admin	2026-05-21 13:42:08 -06:00
bunker-admin	a7d3dd772b	chore(release): ship scripts/lib/ + classify upgrade-stash-cleanup.sh Two release-build fixes paired with the Approach A changes: 1. Add upgrade-stash-cleanup.sh to RUNTIME_SCRIPTS so it ships in the release tarball. Tenants need it to be able to recover from stale upgrade-* git stashes on their own hosts. 2. Copy scripts/lib/ wholesale into the staged release tree. Without this, upgrade.sh's `. scripts/lib/mkdocs-snapshot.sh` source line silently fails on release installs (the file isn't there), and the pre-upgrade tenant-docs snapshot wouldn't fire — defeating the no-regrets fallback. Bunker Admin	2026-05-21 10:36:28 -06:00
bunker-admin	9613c3ec81	fix(upgrade): Phase 1 of upgrade-flow redesign (Approach A) Three coordinated fixes from the upgrade-flow redesign plan (/home/bunker-admin/.claude/plans/okay-so-we-can-enumerated-hejlsberg.md): 1. scripts/lib/mkdocs-snapshot.sh (NEW): pre-upgrade tarball snapshot of the entire mkdocs/ directory into the install root as mkdocs-backup-<timestamp>.tar.gz. Discoverable via `ls`, retained last 5. No-regrets fallback if anything in the upgrade goes sideways. Sourced by upgrade.sh (and later by image-upgrade.sh under Approach B). 2. scripts/upgrade.sh Phase 6 self-destruct fix: previously, the broad `docker compose up -d` recreated the ccp-agent container that was running the script, sending SIGKILL to the bash process before write_result could land result.json. Marcelle's test upgrade hit this tonight. Fix: temporarily remove `ccp-agent` from COMPOSE_PROFILES during Phase 6's broad up -d, then schedule a detached `nohup ... & disown` restart at the very end of the script (after write_result and archive_success_to_history). The deferred subshell sleeps 3s, then recreates ccp-agent under its profile, picking up the new image. 3. scripts/upgrade-stash-cleanup.sh (NEW): one-shot utility to list and drop accumulated `upgrade-` git stashes left over by older upgrade.sh runs whose pop failed silently (Pride Corner has three from 2026-03-09 alone). Warns loudly if any stash holds tenant mkdocs.yml content so operators verify recovery before dropping. The .gitignore now excludes /mkdocs-backup-.tar.gz so the rescue archives don't leak into commits. This is Phase 1 of three: Approach B (image-only upgrade mode) and Approach C (CCP template re-render) follow in subsequent commits. Bunker Admin	2026-05-20 20:43:34 -06:00
bunker-admin	e88ac79ae8	fix(ccp-agent): export COMPOSE_PROJECT_NAME so upgrade.sh sees correct project The agent already passed COMPOSE_PROJECT in env, but Docker Compose actually reads COMPOSE_PROJECT_NAME. When upgrade.sh (running inside the agent container at cwd=/app/instance) shelled out to `docker compose up -d` in Phase 5, compose defaulted the project name to "instance" (cwd basename), collided with the host's existing containers under "changemakerlite", and the upgrade aborted with "Container ... already in use by container ..." errors. Discovered when triggering the first end-to-end CCP "Upgrade Now" on marcelle (v2.9.15 → v2.10.1). Backup/code/rebuild phases all succeeded; migration phase failed instantly. Rollback restored marcelle cleanly. This commit adds COMPOSE_PROJECT_NAME alongside the existing COMPOSE_PROJECT (which the agent's TypeScript still reads for its own slug derivation). Bunker Admin	2026-05-20 15:57:30 -06:00
bunker-admin	1b80e8294c	fix(ccp-agent): whitelist /app/instance for git safe.directory The agent container runs as root but the bind-mounted instance directory is owned by the host user (UID 1000 = `node` in the container). Modern git refuses to operate on such repos without an explicit safe.directory entry, breaking upgrade-check.sh's `git fetch/log` calls on source-installed tenants. Verified empirically on soroush after the previous fix landed. Bunker Admin	2026-05-20 12:14:39 -06:00
bunker-admin	a531f9b9ce	fix(ccp): make agent functional + fix Gitea release timestamp bug Three related fixes uncovered during a marcelle CCP registration test: 1. ccp-agent image was missing bash + curl + jq + python3, so every spawn('bash', ...) in upgrade.routes.ts and backup.routes.ts failed silently with ENOENT. CCP kept reading stale status.json files from disk, masking that no agent had successfully checked for updates in weeks. apk-add the missing tools. 2. ccp-agent's /app/instance mount was :ro, blocking the agent from writing data/upgrade/status.json (and result/progress/backups). Agent already has docker.sock — removing :ro is not a security escalation. Patched both docker-compose.yml and docker-compose.prod.yml. 3. Gitea 1.23.x only initializes Release.CreatedUnix inside its createTag() helper, which is skipped if the tag already exists on origin. The old DEV_WORKFLOW pattern (push tag, then run build-release.sh --upload) was triggering this — releases got created_unix=0 and lost /releases/latest sort order to v2.9.14. build-release.sh now removes the remote tag first and POSTs with target_commitish so Gitea creates the tag and release atomically. After these fixes, CCP's "Check for Updates" path returns truthful data end-to-end (verified on marcelle: v2.9.15 -> v2.10.1, 1 behind). Bunker Admin	2026-05-20 11:59:35 -06:00