docs(blog): v2.9.8 + v2.9.9 post — "A smoother fresh install"

High-level recap of the install-UX sprint for the MkDocs blog. Covers: - What the three-round fresh-install test surfaced - Install-day improvements that shipped in v2.9.8 (host-port preflight, Pangolin smoke test, admin password file, --enable-all installing systemd units, Next Steps verify pointer) - Teardown scripts (pangolin-teardown.sh, ccp-deregister.sh) and CCP's deleteInstance tunnel cleanup - CCP registration polish (slug-conflict 409, poll-rate-limit split, agent exponential backoff) - Release hygiene in v2.9.9 (--replace safety, whitelist parity, install.sh extract preservation on TTY failure) - Brief "what's next / deferred" section Placed under mkdocs/docs/blog/posts/ with the date-prefix filename convention used by 2026-03-27-test-blog-post.md. Bunker Admin
2026-04-16 16:11:08 -06:00 · 2026-04-16 16:11:08 -06:00 · 8a2b82a4e8
commit 8a2b82a4e8
parent 5968df5b42
1 changed files with 77 additions and 0 deletions
--- a/mkdocs/docs/blog/posts/2026-04-16-smoother-fresh-install.md
+++ b/mkdocs/docs/blog/posts/2026-04-16-smoother-fresh-install.md
@ -0,0 +1,77 @@
+---
+date: 2026-04-16
+authors:
+  - admin
+categories:
+  - Platform
+  - Release Notes
+tags:
+  - v2.9.8
+  - v2.9.9
+  - install
+  - ccp
+  - operator-experience
+---
+
+# A smoother fresh install: v2.9.8 and v2.9.9
+
+Installing Changemaker Lite from scratch on a clean server should be fast, quiet, and recoverable. Over the last few days we sat with a clean Ubuntu server and ran the install flow three times — once straight, once to fix what we found, and once to register the result with a Control Panel. Every round surfaced paper cuts. Both releases that landed this week are the result.
+
+<!-- more -->
+
+## The setup
+
+Three fresh installs on a remote test bench (`marcelle`), each starting from `sudo rm -rf ~/changemaker.lite` and ending at a working deployment reachable through a Pangolin tunnel at `cursedknowledge.org`. In between, we captured every place the install asked us to think harder than it should have, every command that ran silently where a status line would have helped, and every failure mode where the cleanup left the user worse off than before.
+
+Ten distinct friction points came out of the three runs. Every one is addressed in v2.9.8 or v2.9.9.
+
+## Install-day improvements (v2.9.8)
+
+**The installer now checks your host before it downloads anything.** `scripts/install.sh` runs `ss -Htln` against the ~14 ports the stack binds (3000, 4000, 9090, 3030, …) before pulling the 15 MB tarball. If something else on the host is already listening — the canonical example being `cockpit.socket` on Ubuntu Server, which claims `:9090` — the install aborts with a culprit-specific hint (`sudo systemctl disable --now cockpit.socket`) instead of starting a partial-stack deployment and failing mid-`compose up`.
+
+**`config.sh` smoke-tests Pangolin credentials before committing them to `.env`.** In non-interactive mode, a typo in `--pangolin-api-key` used to fail much later, when Newt couldn't connect and the symptoms looked like a tunnel outage. A single round-trip to `/org/:id/resources` now catches the typo in seconds with a clear message. Skip with `--skip-pangolin-check` if you're bootstrapping offline.
+
+**Auto-generated admin passwords are saved to a file.** When you run `config.sh -y` without `--admin-password`, the wizard generates a strong password. It prints it once — which used to mean users piping through `tee` or scrolling past it lost it forever. The password is now also written to `data/admin-credentials.txt` (mode `0600`). Delete the file once you've stored the password elsewhere. Explicit `--admin-password` is never persisted.
+
+**`--enable-all` now actually installs everything.** Previously `config.sh -y --enable-all` silently skipped the systemd upgrade watcher and the daily backup timer — even though `--enable-all` strongly suggests they'd be installed. Automation-driven installs missed the self-healing infrastructure we'd worked hard to stabilize. They now install by default, with `--no-install-watcher` / `--no-install-backup-timer` for deliberate opt-out.
+
+**Next Steps now points at verification.** After config completes, the installer tells you to run `bash scripts/test-deployment.sh --wait 60`. It's a 37-check verifier that exists in the repo but wasn't surfaced anywhere. It confirms all containers are healthy, the API is responding, the tunnel is reachable, and the database is seeded. Run it after every install — it'll save hours of "is it actually working?"
+
+## Teardown that actually tears down
+
+Fresh installs used to leave two classes of orphans:
+
+- **Pangolin resources and sites** in your tunnel org that nobody cleans up when you `docker compose down -v`
+- **CCP Instance rows** from phone-home registrations that outlive the underlying stack
+
+Both now have proper scripts. `scripts/pangolin-teardown.sh --yes` wipes an org's resources + sites (dry-run by default, `--keep-site-ids` for safety). `scripts/ccp-deregister.sh --yes` removes this host's `Instance` from a connected Control Panel. The full teardown sequence is documented and fits on one card.
+
+We also wired tunnel cleanup into CCP's own `deleteInstance` — previously, clicking "Delete Instance" in CCP admin stopped Docker but orphaned the Pangolin site. That was a bug in our plumbing; it's fixed.
+
+## CCP registration, less fragile
+
+Registering an existing install with a Control Panel used to hit two avoidable failures. Both are addressed:
+
+**Slug collisions return a clean 409.** If you tore down the target stack without running `ccp-deregister.sh` first, CCP's stale `Instance` row blocked re-registration. The error was a raw Prisma stack trace. It's now a `409 SLUG_CONFLICT` with a message pointing at the fix: *"Delete the stale instance first or run `scripts/ccp-deregister.sh` from the target host."*
+
+**The agent doesn't wedge on rate limits.** The old poll loop sent a `/poll` request every 30 seconds at a fixed cadence. If admin approval took longer than 5 minutes, CCP's rate limiter returned HTTP 429 and the agent got stuck — recovery required an operator to manually restart the container. We split the rate limiters (strict on `/register`, looser on `/poll` to accommodate the normal polling pattern), and taught the agent to back off exponentially on 429 (30s → 60s → 120s → 300s cap) with reset on success.
+
+## Release hygiene (v2.9.9)
+
+Two small guards against future drift:
+
+**`build-release.sh --upload` refuses to overwrite an existing tag** unless you pass `--replace`. Silently pushing new tarball contents under an existing version breaks upgrade checks for users on that version — they see "no update available" even though the bytes changed. Version tags should be immutable once published.
+
+**A parity check on the shipping whitelist.** Adding a new script to `scripts/` now requires deliberately classifying it as ship (`RUNTIME_SCRIPTS`) or don't-ship (`DEV_ONLY_SCRIPTS`). An unclassified script aborts the build. No more "added a script three months ago, forgot to add it to the release, nobody noticed."
+
+**`install.sh` preserves your extracted tarball on config-wizard failure.** The classic `curl | ssh bash` over non-interactive SSH fails because `/dev/tty` is unavailable. The cleanup trap used to remove the install directory, forcing a 15 MB re-download. It now distinguishes "extract OK, wizard didn't run" from "extraction itself failed" and preserves the dir with a resumption hint: *"cd /path && bash config.sh on an interactive console"*.
+
+## What's next
+
+Both releases are out. The install flow is meaningfully more resilient — a test that hit two classes of failure now recovers gracefully from both, on a run that took twelve minutes end-to-end including a full stack up and verifier pass. Documentation (prerequisites, installation, first-steps, control-panel) has been swept to match.
+
+A few items we noted but deliberately deferred: the agent doesn't yet detect when its mTLS cert points at a CCP-deleted instance and re-enter registration mode (real concern past 5+ fleet instances, not now); the CCP's default admin credentials in `.env.example` deserve a security pass of their own; the Gitea tarball download speed is a network-layer issue outside the install script's scope.
+
+If you're running on a pre-v2.9.8 install, `./scripts/upgrade.sh` is a safe path forward. If you're coming fresh, the one-liner — `curl -fsSL https://gitea.bnkops.com/admin/changemaker.lite/raw/branch/main/scripts/install.sh | bash` — should get you further without asking you to think.
+
+As always, issues and feedback via Gitea. Thanks for following along.