Upgrade
CertAutoPilot upgrades in place. The binary is the only moving part; state lives in MongoDB and the secret store, both of which are preserved across upgrades. Index migrations run at startup. Rollback is "run the previous binary" — no schema migrations to reverse.
01Version compatibility
- Patch bumps (e.g. 1.4.0 → 1.4.x) — always safe. Bug fixes + security patches only.
- Minor bumps (1.3 → 1.4) — safe. Additive schema / API changes only.
- Major bumps (1.x → 2.0) — read the CHANGELOG. May require ordered steps or a data migration script.
The X-Deprecation response header warns well in advance when an API feature is going away.
02Standalone upgrade
Use update.sh when you only want to bump the
version. It pulls the pinned tarball, verifies the checksum,
and hands off to the bundled upgrade.sh. Binary,
frontend, and systemd unit (with rollback snapshot) refresh
only — config.yaml, secrets.env
(including HSM PIN), TLS material, nginx config, and MongoDB
all stay untouched.
curl -fsSL https://raw.githubusercontent.com/CloudNativeWorks/certautopilot-archive/main/update.sh \
| sudo bash -s -- --version=1.4.5
Use get.sh instead when you also need to change
install flags (--mongo, --tls,
--kek-provider, --port,
--extra-hostnames, …). install.sh is
idempotent — it re-renders what the new flags affect while
preserving every secret on disk.
curl -fsSL https://raw.githubusercontent.com/CloudNativeWorks/certautopilot-archive/main/get.sh \
| sudo bash -s -- --version=1.4.5 --mongo=local --extra-hostnames=cap.new-domain.test
Or, if you already downloaded the tarball:
VERSION=1.4.5
oras pull ghcr.io/cloudnativeworks/standalone/certautopilot:${VERSION}
tar -xzf certautopilot_${VERSION}_linux_amd64.tar.gz
cd certautopilot_${VERSION}_linux_amd64
sudo ./upgrade.sh
upgrade.sh stops the service, atomically replaces
the binary (mv onto the old path), refreshes the
frontend assets under /usr/share/certautopilot/web/,
takes a systemd unit snapshot + rolls back if the new binary
fails to start, refreshes the journald retention drop-in, the
cap-kek wrapper, and the KEK rotation runbook,
then bounces the service. config.yaml,
secrets.env, TLS, nginx config, and MongoDB stay
untouched. Zero-downtime is NOT guaranteed on a single host —
expect a ~5 second blip during the restart.
03Helm upgrade
helm upgrade certautopilot oci://ghcr.io/cloudnativeworks/charts/certautopilot \
--version 1.4.0 \
-f values.yaml
kubectl rollout status deployment/certautopilot --timeout=5m
The chart performs a rolling restart. With ≥ 2 API replicas and
maxUnavailable: 25% (the default), there's no
downtime. Scheduler leader gracefully hands off. Workers finish
their current job before rolling.
04Breaking changes
When a breaking change is necessary, we ship it across two minor releases:
- 1.X: the new behaviour lands behind a feature flag. Old behaviour is default. Deprecation warnings flow into logs + API responses. Docs explain how to migrate.
- 1.Y (at least one quarter later): the old behaviour is removed. Operators who migrated during 1.X are unaffected.
This cadence lets you opt-in on your own schedule rather than being forced by the release train.
05CGO / base image changes
The Phase 2 PKCS#11 work required CGO, which in turn required
switching the container base image from Alpine (musl) to
Debian bookworm (glibc). No functional change for users — but
be aware if you layer your own tooling on the image:
apk add won't work; use apt install.
Image size is larger by ~40 MB.
06MongoDB version bumps
- Within a major (6.0 → 6.0.x), just upgrade Mongo.
- Across majors (6 → 7): follow MongoDB's official upgrade path. CertAutoPilot supports 6.0+.
- Bundled Bitnami chart in Helm:
helm upgradewith a newmongodb.image.tagvalue handles the rolling step.
07KEK during upgrade
An in-flight KEK rotation mid-upgrade is safe: the rotation
handler is resumable. If the worker pod running the rotation
restarts, another worker picks it up from the last batch
boundary. Monitoring: kek status shows running /
paused states.
08Rollback
Standalone
# Pull the older version's tarball, then
sudo ./upgrade.sh # the script is version-agnostic; it runs whatever binary is in the dir
MongoDB data is forward-compatible within a major; running an older binary against a newer DB state usually works. The CHANGELOG lists incompatibilities (e.g. a new required field introduced in 1.4 would break a 1.3 rollback).
Helm
helm rollback certautopilot 1
Helm keeps revision history. Previous pod image rolls out. Same compatibility caveat as standalone.
09Index creation on startup
The backend ensures indexes at startup via
database.EnsureIndexes(). This is idempotent
(safe to run repeatedly) and usually fast, but the first
startup after a major upgrade can take tens of seconds on
large collections while indexes build. Plan your restart
window accordingly.
10Pre-upgrade checklist
- Read the CHANGELOG for the target version. Note any breaking changes.
- Back up MongoDB + the secret store. Always both, at the same point-in-time.
- Schedule the upgrade during a lull — renewals queuing during a restart is fine, but avoid stacking a KEK rotation and a version bump.
- Verify rollback path — can you pull the old tarball / image back?
- Alert your ops channel. Expect a short banner during the restart.
11Troubleshooting
Service hangs on startup after upgrade
Usually an index build on a very large collection. Check mongodb logs; currentOp shows the build. Patience. If truly stuck for > 30 min, stop the service, drop the in-progress index manually (safe — EnsureIndexes will recreate it), restart.
After upgrade: clients get 415 Unsupported Media Type
Older clients sending Content-Type: text/plain hit the stricter JSON check. Update the client to send application/json.
Helm upgrade silently didn't re-roll pods
No change in the rendered Secret checksum. Bump an irrelevant field (podAnnotations.releasedAt) to force a rollout, or use kubectl rollout restart.