Upgrade

CertAutoPilot upgrades in place. The binary is the only moving part; state lives in MongoDB and the secret store, both of which are preserved across upgrades. Index migrations run at startup. Rollback is "run the previous binary" — no schema migrations to reverse.

01Version compatibility

  • Patch bumps (e.g. 1.4.0 → 1.4.x) — always safe. Bug fixes + security patches only.
  • Minor bumps (1.3 → 1.4) — safe. Additive schema / API changes only.
  • Major bumps (1.x → 2.0) — read the CHANGELOG. May require ordered steps or a data migration script.

The X-Deprecation response header warns well in advance when an API feature is going away.

02Standalone upgrade

Use update.sh when you only want to bump the version. It pulls the pinned tarball, verifies the checksum, and hands off to the bundled upgrade.sh. Binary, frontend, and systemd unit (with rollback snapshot) refresh only — config.yaml, secrets.env (including HSM PIN), TLS material, nginx config, and MongoDB all stay untouched.

curl -fsSL https://raw.githubusercontent.com/CloudNativeWorks/certautopilot-archive/main/update.sh \
  | sudo bash -s -- --version=1.4.5

Use get.sh instead when you also need to change install flags (--mongo, --tls, --kek-provider, --port, --extra-hostnames, …). install.sh is idempotent — it re-renders what the new flags affect while preserving every secret on disk.

curl -fsSL https://raw.githubusercontent.com/CloudNativeWorks/certautopilot-archive/main/get.sh \
  | sudo bash -s -- --version=1.4.5 --mongo=local --extra-hostnames=cap.new-domain.test

Or, if you already downloaded the tarball:

VERSION=1.4.5
oras pull ghcr.io/cloudnativeworks/standalone/certautopilot:${VERSION}
tar -xzf certautopilot_${VERSION}_linux_amd64.tar.gz
cd certautopilot_${VERSION}_linux_amd64
sudo ./upgrade.sh

upgrade.sh stops the service, atomically replaces the binary (mv onto the old path), refreshes the frontend assets under /usr/share/certautopilot/web/, takes a systemd unit snapshot + rolls back if the new binary fails to start, refreshes the journald retention drop-in, the cap-kek wrapper, and the KEK rotation runbook, then bounces the service. config.yaml, secrets.env, TLS, nginx config, and MongoDB stay untouched. Zero-downtime is NOT guaranteed on a single host — expect a ~5 second blip during the restart.

03Helm upgrade

helm upgrade certautopilot oci://ghcr.io/cloudnativeworks/charts/certautopilot \
  --version 1.4.0 \
  -f values.yaml
kubectl rollout status deployment/certautopilot --timeout=5m

The chart performs a rolling restart. With ≥ 2 API replicas and maxUnavailable: 25% (the default), there's no downtime. Scheduler leader gracefully hands off. Workers finish their current job before rolling.

04Breaking changes

When a breaking change is necessary, we ship it across two minor releases:

  1. 1.X: the new behaviour lands behind a feature flag. Old behaviour is default. Deprecation warnings flow into logs + API responses. Docs explain how to migrate.
  2. 1.Y (at least one quarter later): the old behaviour is removed. Operators who migrated during 1.X are unaffected.

This cadence lets you opt-in on your own schedule rather than being forced by the release train.

05CGO / base image changes

The Phase 2 PKCS#11 work required CGO, which in turn required switching the container base image from Alpine (musl) to Debian bookworm (glibc). No functional change for users — but be aware if you layer your own tooling on the image: apk add won't work; use apt install. Image size is larger by ~40 MB.

06MongoDB version bumps

  • Within a major (6.0 → 6.0.x), just upgrade Mongo.
  • Across majors (6 → 7): follow MongoDB's official upgrade path. CertAutoPilot supports 6.0+.
  • Bundled Bitnami chart in Helm: helm upgrade with a new mongodb.image.tag value handles the rolling step.

07KEK during upgrade

An in-flight KEK rotation mid-upgrade is safe: the rotation handler is resumable. If the worker pod running the rotation restarts, another worker picks it up from the last batch boundary. Monitoring: kek status shows running / paused states.

08Rollback

Standalone

# Pull the older version's tarball, then
sudo ./upgrade.sh   # the script is version-agnostic; it runs whatever binary is in the dir

MongoDB data is forward-compatible within a major; running an older binary against a newer DB state usually works. The CHANGELOG lists incompatibilities (e.g. a new required field introduced in 1.4 would break a 1.3 rollback).

Helm

helm rollback certautopilot 1

Helm keeps revision history. Previous pod image rolls out. Same compatibility caveat as standalone.

09Index creation on startup

The backend ensures indexes at startup via database.EnsureIndexes(). This is idempotent (safe to run repeatedly) and usually fast, but the first startup after a major upgrade can take tens of seconds on large collections while indexes build. Plan your restart window accordingly.

10Pre-upgrade checklist

  1. Read the CHANGELOG for the target version. Note any breaking changes.
  2. Back up MongoDB + the secret store. Always both, at the same point-in-time.
  3. Schedule the upgrade during a lull — renewals queuing during a restart is fine, but avoid stacking a KEK rotation and a version bump.
  4. Verify rollback path — can you pull the old tarball / image back?
  5. Alert your ops channel. Expect a short banner during the restart.

11Troubleshooting

Service hangs on startup after upgrade

Usually an index build on a very large collection. Check mongodb logs; currentOp shows the build. Patience. If truly stuck for > 30 min, stop the service, drop the in-progress index manually (safe — EnsureIndexes will recreate it), restart.

After upgrade: clients get 415 Unsupported Media Type

Older clients sending Content-Type: text/plain hit the stricter JSON check. Update the client to send application/json.

Helm upgrade silently didn't re-roll pods

No change in the rendered Secret checksum. Bump an irrelevant field (podAnnotations.releasedAt) to force a rollout, or use kubectl rollout restart.