8 Commits

Author SHA1 Message Date
abb4034e4b feat(upgrade): Approach C - CCP-driven release upgrade (template re-render)
Adds the third upgrade path alongside Approach A (full upgrade.sh) and B
(image-only). For releases that change orchestration (new services, new
nginx routes, new compose env vars) in addition to image versions, CCP
re-renders templates server-side, sends the rendered files to the tenant
via the existing mTLS agent, then composePull + composeUp. Tenant content
(mkdocs/, custom configs/) is never touched.

Pieces:

PHASE 1 — Schema + per-instance imageTag

- prisma/schema.prisma: new Instance.imageTag column (NULL = fall back
  to env.IMAGE_TAG default).
- prisma/migrations/20260522093400_add_instance_image_tag/: SQL.
- services/template-engine.ts:
  - buildTemplateContext now uses instance.imageTag || env.IMAGE_TAG.
  - InstanceForTemplate interface gains imageTag: string | null.

PHASE 2 — Pre-flight diff (read-only "what would change?")

- agent/services/file.service.ts: new diffFiles() helper with a small
  inline LCS-based unified-diff (no new deps). Returns per-file status
  ('unchanged' | 'modified' | 'created') + truncated unified diff.
- agent/routes/files.routes.ts: POST /instance/:slug/files/diff.
- api/services/execution-driver.ts: diffFiles added to interface.
- api/services/local-driver.ts + remote-driver.ts: diffFiles methods
  (local mirrors agent helper inline; remote POSTs to the agent endpoint).
- api/services/upgrade.service.ts: previewReleaseUpgrade() — renders
  templates in-memory with the proposed imageTag, filters out .env for
  isRegistered=true tenants, calls driver.diffFiles, computes envCoverage
  (which env vars the new compose needs vs which the tenant's .env has).

PHASE 3 — Apply path (the actual upgrade)

- api/services/upgrade.service.ts: startReleaseUpgrade() and the inner
  runReleaseUpgrade() runner. Distinct from runRemoteUpgrade because CCP
  does the work directly via the mTLS driver (no agent-side script).
  Flow: persist imageTag in DB → render → writeFiles → composePull →
  composeUp → composePs verify. Status reported via InstanceUpgrade
  rows (same shape the existing CCP polling UI already uses).
- Failure handling: instance.imageTag stays at the new value on failure
  so operator can retry. Manual rollback only.

PHASE 4 — Routes + schemas

- instances.schemas.ts: startReleaseUpgradeSchema (imageTag regex).
- instances.routes.ts:
  - POST /:id/upgrade-release       (apply)
  - POST /:id/upgrade-release/preview (read-only diff)

PHASE 5 — CCP admin UI

- admin/pages/InstanceDetailPage.tsx: third "Upgrade to Release" button
  next to Quick Upgrade + Upgrade Now. Opens a modal with imageTag input,
  Preview button (calls /preview), and Apply button. Preview modal shows:
  - Red alert if envCoverage.missingInTenantEnv is non-empty (compose
    needs vars the tenant's .env doesn't define).
  - Per-file status tags (unchanged / modified / created) + truncated
    unified diff for modified files.
- admin/types/api.ts: Instance.imageTag added.

Constraints applied:
- Remote-only initial scope: throws "currently supported only for remote
  instances" if instance.isRemote === false.
- isRegistered=true tenants (install.sh fleet): .env is filtered out
  of the render set (CCP can't render env without secrets in DB), the
  tenant's existing .env stays as-is. envCoverage warns the operator
  if the new compose references env vars their .env doesn't define.
- Shared in-progress guard with Approach A/B (one upgrade at a time).

Per the plan: see ~/.claude/plans/insight-temporal-bachman.md.

All three projects type-check cleanly (api, agent, admin).

Bunker Admin
2026-05-22 09:45:37 -06:00
4a3d9d7c41 feat(upgrade): Approach B - image-only upgrade mode
Add a "Quick Upgrade" path that pulls latest container images and recreates
only the core app services (api, admin, media-api, nginx) without touching
any tracked files. Tenant content (mkdocs/, configs/, scripts/) is implicitly
preserved because the script never writes outside docker.

Faster (~2 min vs ~4-5 min for full upgrade) and structurally safer for
releases that don't change orchestration/templates.

Pieces:
- scripts/image-upgrade.sh: new ~350-line script. Phases: pre-flight +
  mkdocs snapshot, image pull, targeted recreate (broad up -d would cascade
  on misconfigured infra containers — proven on marcelle), light health
  checks, deferred ccp-agent restart. Writes the same progress.json +
  result.json schema as upgrade.sh so the CCP poll loop is unchanged.
- agent/src/routes/upgrade.routes.ts: POST /instance/:slug/upgrade/start-image-only.
  Same lock + staleness guards as the existing /upgrade/start endpoint.
- api/src/services/remote-driver.ts: RemoteDriver.startImageUpgrade().
- api/src/services/upgrade.service.ts: startImageUpgrade() entry point;
  reuses runRemoteUpgrade with mode='image-only' (only the initial agent
  call differs — result schema and polling are identical).
- api/src/modules/instances/instances.routes.ts: POST /:id/upgrade-images
  + startImageUpgradeSchema.
- admin/src/pages/InstanceDetailPage.tsx: secondary "Quick Upgrade" button
  next to "Upgrade Now" on the Updates tab. Tooltip explains when to use it.

Tested locally on marcelle (v2.10.2 idempotent run): 1m 49s, mkdocs.yml md5
unchanged, file count unchanged, only api/admin/media-api/nginx touched.
Subtle bug found and fixed: `set -o pipefail` + `grep -q` shorts pipe and
SIGPIPEs the writer — captured services list once instead.

Bunker Admin
2026-05-21 15:20:35 -06:00
1b80e8294c fix(ccp-agent): whitelist /app/instance for git safe.directory
The agent container runs as root but the bind-mounted instance directory
is owned by the host user (UID 1000 = `node` in the container). Modern
git refuses to operate on such repos without an explicit safe.directory
entry, breaking upgrade-check.sh's `git fetch/log` calls on source-installed
tenants. Verified empirically on soroush after the previous fix landed.

Bunker Admin
2026-05-20 12:14:39 -06:00
a531f9b9ce fix(ccp): make agent functional + fix Gitea release timestamp bug
Three related fixes uncovered during a marcelle CCP registration test:

1. ccp-agent image was missing bash + curl + jq + python3, so every
   spawn('bash', ...) in upgrade.routes.ts and backup.routes.ts failed
   silently with ENOENT. CCP kept reading stale status.json files from
   disk, masking that no agent had successfully checked for updates in
   weeks. apk-add the missing tools.

2. ccp-agent's /app/instance mount was :ro, blocking the agent from
   writing data/upgrade/status.json (and result/progress/backups).
   Agent already has docker.sock — removing :ro is not a security
   escalation. Patched both docker-compose.yml and docker-compose.prod.yml.

3. Gitea 1.23.x only initializes Release.CreatedUnix inside its
   createTag() helper, which is skipped if the tag already exists on
   origin. The old DEV_WORKFLOW pattern (push tag, then run
   build-release.sh --upload) was triggering this — releases got
   created_unix=0 and lost /releases/latest sort order to v2.9.14.
   build-release.sh now removes the remote tag first and POSTs with
   target_commitish so Gitea creates the tag and release atomically.

After these fixes, CCP's "Check for Updates" path returns truthful
data end-to-end (verified on marcelle: v2.9.15 -> v2.10.1, 1 behind).

Bunker Admin
2026-05-20 11:59:35 -06:00
d2da13929a ccp: split /register and /poll rate limits; agent backoff on 429
Problem: the agent polled /poll every 30s while waiting for admin
approval. At 10 req/15min, the 11th poll hit 429 after ~5 min and
every subsequent one also failed — recovery required an agent
restart. A human-paced approval SLA is longer than 5 minutes.

CCP side (agents.routes.ts):
  Split the one-size-fits-all agentRegistrationLimiter into two.
  /register stays tight (10/15min — invite-code brute force is the
  real attack surface). /poll gets a new agentPollLimiter at 180/15min
  (one poll per ~5s upper bound), scoped to registrationId+slug so
  blast radius is bounded.

Agent side (server.ts):
  Replaced fixed 30s setInterval with a self-scheduling setTimeout
  loop that backs off exponentially on HTTP 429 (30s → 60s → 120s →
  300s cap) and resets to 30s on any 2xx. Stop-flag protects against
  re-entry after approval. Fixes the "agent wedged at 429, restart to
  recover" workaround.

Bunker Admin
2026-04-16 13:11:39 -06:00
e55bc07eb6 Security hardening: red-team remediation + CCP/WIP updates
## Security (red-team audit 2026-04-12)

Public data exposure (P0):
- Public map converted to server-side heatmap, 2-decimal (~1.1km) bucketing,
  no addresses/support-levels/sign-info returned
- Petition signers endpoint strips displayName/signerComment/geoCity/geoCountry
- Petition public-stats drops recentSigners entirely
- Response wall strips userComment + submittedByName
- Campaign createdByUserEmail + moderation fields gated to SUPER_ADMIN

Access control (P1):
- Campaign findById/update/delete/email-stats enforce owner === req.user.id
  (SUPER_ADMIN bypasses), return 404 to avoid enumeration
- GPS tracking session route restricted to session owner or SUPER_ADMIN
- Canvass volunteer stats restricted to self or SUPER_ADMIN
- People household endpoints restricted to INFLUENCE + MAP roles (was ADMIN*)
- CCP upgrade.service.ts + certificate.service.ts gate user-controlled
  shell inputs (branch, path, slug, SAN hostname) behind regex validators

Token security (P2):
- Query-param JWT auth replaced with HMAC-signed short-lived URLs
  (utils/signed-url.ts + /api/media/sign endpoint); legacy ?token= removed
  from media streaming, photos, chat-notifications, and social SSE
- GITEA_SSO_SECRET + SERVICE_PASSWORD_SALT now REQUIRED (min 32 chars);
  JWT_ACCESS_SECRET fallback removed — BREAKING for existing deployments
- Refresh tokens bound to device fingerprint (UA + /24 IP) via `df` JWT
  claim; mismatch revokes all user sessions
- Refresh expiry reduced 7d → 24h
- Refresh/logout via request body removed — httpOnly cookie only
- Password-reset + verification-resend rate limits now keyed on (IP, email)
  composite to prevent both IP rotation and email enumeration

Defense-in-depth (P3):
- DOMPurify sanitization applied to GrapesJS landing page HTML/CSS
- /api/health?detailed=true disk-space leak removed
- Password-reset/verification token log lines no longer include userId

## Deployment

- docker-compose.yml + docker-compose.prod.yml: media-api now receives
  GITEA_SSO_SECRET + SERVICE_PASSWORD_SALT; empty fallbacks removed
- CCP templates/env.hbs adds both new secrets; refresh expiry → 24h
- CCP secret-generator.ts generates giteaSsoSecret + servicePasswordSalt
- leaflet.heat added to admin/package.json for heatmap rendering

## Operator action required on existing installs

Run `./config.sh` once (idempotent — only fills empty values) or manually
add GITEA_SSO_SECRET + SERVICE_PASSWORD_SALT to .env via
`openssl rand -hex 32`. Startup fails with a clear Zod error otherwise.

See SECURITY_REDTEAM_2026-04-12.md for full audit and verification matrix.

## Other

Includes in-flight CCP work: instance schema tweaks, agent server updates,
health service, tunnel service, DEV_WORKFLOW doc updates, and new migration
dropping composeProject uniqueness.

Bunker Admin
2026-04-12 15:17:00 -06:00
26ec925d9b CCP restore/tunnel/upgrade + upgrade.sh release-mode fixes + volunteer dashboard polish
- Add instance restore model, routes, and agent backup/restore endpoints
- Add Pangolin tunnel service (subdomain prefix, teardown action, CCP client)
- Add slug mutex for concurrent operation safety in agent
- Expand upgrade service with remote driver orchestration
- Fix upgrade.sh to properly handle release-mode installs (no git operations)
- Add CCP registration flags to config.sh (--ccp-url, --ccp-invite-code, --ccp-agent-url)
- Auto-detect JVB advertise IP in non-interactive mode
- Polish volunteer dashboard ActionStepsList with highlighted step component
- Add ticketed event description field + volunteer dashboard query refinements

Bunker Admin
2026-04-12 11:09:46 -06:00
38ccaa8a5b Add remote instance management with mTLS agent and phone-home registration
Enables the CCP to manage CML instances on remote servers via a lightweight
HTTP agent. Key components:

- ExecutionDriver abstraction (local-driver.ts / remote-driver.ts) routes
  operations to local Docker or remote agent transparently
- Remote agent package (agent/) with mTLS authentication, Docker Compose
  operations, file management, backup/upgrade delegation
- Certificate service using openssl CLI for CA management and cert issuance
- Phone-home registration: remote agents register via invite code, CCP admin
  approves, agent receives mTLS cert bundle automatically
- config.sh integration with configure_control_panel() section
- ccp-agent Docker Compose service (profile-gated)
- Frontend: AgentRegistrationsPage, InviteCodesPage, Remote Agents sidebar menu
- Security hardened: cert bundle wiped after delivery, shell injection prevention
  via execFile, command allowlist with metachar rejection, rate-limited public
  endpoints, auto-populated fingerprint pinning

Also wires ENABLE_SOCIAL/PEOPLE/ANALYTICS through env.ts, seed.ts, and
docker-compose env passthrough (from previous session).

Bunker Admin
2026-04-07 15:24:33 -06:00