Skip to content

A smoother fresh install: v2.9.8 and v2.9.9

Installing Changemaker Lite from scratch on a clean server should be fast, quiet, and recoverable. Over the last few days we sat with a clean Ubuntu server and ran the install flow three times — once straight, once to fix what we found, and once to register the result with a Control Panel. Every round surfaced paper cuts. Both releases that landed this week are the result.

The setup

Three fresh installs on a remote test bench (marcelle), each starting from sudo rm -rf ~/changemaker.lite and ending at a working deployment reachable through a Pangolin tunnel at cursedknowledge.org. In between, we captured every place the install asked us to think harder than it should have, every command that ran silently where a status line would have helped, and every failure mode where the cleanup left the user worse off than before.

Ten distinct friction points came out of the three runs. Every one is addressed in v2.9.8 or v2.9.9.

Install-day improvements (v2.9.8)

The installer now checks your host before it downloads anything. scripts/install.sh runs ss -Htln against the ~14 ports the stack binds (3000, 4000, 9090, 3030, …) before pulling the 15 MB tarball. If something else on the host is already listening — the canonical example being cockpit.socket on Ubuntu Server, which claims :9090 — the install aborts with a culprit-specific hint (sudo systemctl disable --now cockpit.socket) instead of starting a partial-stack deployment and failing mid-compose up.

config.sh smoke-tests Pangolin credentials before committing them to .env. In non-interactive mode, a typo in --pangolin-api-key used to fail much later, when Newt couldn't connect and the symptoms looked like a tunnel outage. A single round-trip to /org/:id/resources now catches the typo in seconds with a clear message. Skip with --skip-pangolin-check if you're bootstrapping offline.

Auto-generated admin passwords are saved to a file. When you run config.sh -y without --admin-password, the wizard generates a strong password. It prints it once — which used to mean users piping through tee or scrolling past it lost it forever. The password is now also written to data/admin-credentials.txt (mode 0600). Delete the file once you've stored the password elsewhere. Explicit --admin-password is never persisted.

--enable-all now actually installs everything. Previously config.sh -y --enable-all silently skipped the systemd upgrade watcher and the daily backup timer — even though --enable-all strongly suggests they'd be installed. Automation-driven installs missed the self-healing infrastructure we'd worked hard to stabilize. They now install by default, with --no-install-watcher / --no-install-backup-timer for deliberate opt-out.

Next Steps now points at verification. After config completes, the installer tells you to run bash scripts/test-deployment.sh --wait 60. It's a 37-check verifier that exists in the repo but wasn't surfaced anywhere. It confirms all containers are healthy, the API is responding, the tunnel is reachable, and the database is seeded. Run it after every install — it'll save hours of "is it actually working?"

Teardown that actually tears down

Fresh installs used to leave two classes of orphans:

  • Pangolin resources and sites in your tunnel org that nobody cleans up when you docker compose down -v
  • CCP Instance rows from phone-home registrations that outlive the underlying stack

Both now have proper scripts. scripts/pangolin-teardown.sh --yes wipes an org's resources + sites (dry-run by default, --keep-site-ids for safety). scripts/ccp-deregister.sh --yes removes this host's Instance from a connected Control Panel. The full teardown sequence is documented and fits on one card.

We also wired tunnel cleanup into CCP's own deleteInstance — previously, clicking "Delete Instance" in CCP admin stopped Docker but orphaned the Pangolin site. That was a bug in our plumbing; it's fixed.

CCP registration, less fragile

Registering an existing install with a Control Panel used to hit two avoidable failures. Both are addressed:

Slug collisions return a clean 409. If you tore down the target stack without running ccp-deregister.sh first, CCP's stale Instance row blocked re-registration. The error was a raw Prisma stack trace. It's now a 409 SLUG_CONFLICT with a message pointing at the fix: "Delete the stale instance first or run scripts/ccp-deregister.sh from the target host."

The agent doesn't wedge on rate limits. The old poll loop sent a /poll request every 30 seconds at a fixed cadence. If admin approval took longer than 5 minutes, CCP's rate limiter returned HTTP 429 and the agent got stuck — recovery required an operator to manually restart the container. We split the rate limiters (strict on /register, looser on /poll to accommodate the normal polling pattern), and taught the agent to back off exponentially on 429 (30s → 60s → 120s → 300s cap) with reset on success.

Release hygiene (v2.9.9)

Two small guards against future drift:

build-release.sh --upload refuses to overwrite an existing tag unless you pass --replace. Silently pushing new tarball contents under an existing version breaks upgrade checks for users on that version — they see "no update available" even though the bytes changed. Version tags should be immutable once published.

A parity check on the shipping whitelist. Adding a new script to scripts/ now requires deliberately classifying it as ship (RUNTIME_SCRIPTS) or don't-ship (DEV_ONLY_SCRIPTS). An unclassified script aborts the build. No more "added a script three months ago, forgot to add it to the release, nobody noticed."

install.sh preserves your extracted tarball on config-wizard failure. The classic curl | ssh bash over non-interactive SSH fails because /dev/tty is unavailable. The cleanup trap used to remove the install directory, forcing a 15 MB re-download. It now distinguishes "extract OK, wizard didn't run" from "extraction itself failed" and preserves the dir with a resumption hint: "cd /path && bash config.sh on an interactive console".

What's next

Both releases are out. The install flow is meaningfully more resilient — a test that hit two classes of failure now recovers gracefully from both, on a run that took twelve minutes end-to-end including a full stack up and verifier pass. Documentation (prerequisites, installation, first-steps, control-panel) has been swept to match.

A few items we noted but deliberately deferred: the agent doesn't yet detect when its mTLS cert points at a CCP-deleted instance and re-enter registration mode (real concern past 5+ fleet instances, not now); the CCP's default admin credentials in .env.example deserve a security pass of their own; the Gitea tarball download speed is a network-layer issue outside the install script's scope.

If you're running on a pre-v2.9.8 install, ./scripts/upgrade.sh is a safe path forward. If you're coming fresh, the one-liner — curl -fsSL https://gitea.bnkops.com/admin/changemaker.lite/raw/branch/main/scripts/install.sh | bash — should get you further without asking you to think.

As always, issues and feedback via Gitea. Thanks for following along.