Backup & restore
Two things must be backed up together: MongoDB (all the data) and the secret store (the KEK that can decrypt it). Losing either makes the other useless. Every backup strategy in this doc backs up both at the same point in time and stores them off-host.
01What to back up
| Component | Standalone path | Kubernetes source |
|---|---|---|
| MongoDB | mongodump against /etc/certautopilot/mongo-root.env user | mongodump against the chart's MongoDB service |
| Backend secret store | /etc/certautopilot/secrets.env | The K8s Secret referenced in values.yaml |
| TLS material (the frontend cert) | /etc/certautopilot/tls/ | The Ingress TLS Secret |
| Config | /etc/certautopilot/config.yaml | values.yaml in your GitOps repo |
| (PKCS#11 only) HSM key material | HSM vendor's backup procedure | HSM vendor's backup procedure |
02Standalone — nightly automated backup
The bootstrap ships a systemd timer + service, but both are
opt-in — pass --enable-backup at
install time (it is off by default so operators with their own
backup orchestration — restic, borg, Ansible cron, etc. —
aren't surprised by a second schedule). The flag is only
honored together with --mongo=local; external
deployments own their backup stack regardless of the flag.
curl -fsSL https://raw.githubusercontent.com/CloudNativeWorks/certautopilot-archive/main/get.sh \
| sudo bash -s -- --version=<pinned> --mongo=local --enable-backup
# Verify the timer is armed
systemctl list-timers | grep certautopilot
Rerunning the bootstrap without --enable-backup
disables the timer (the flag is the single source of truth for
the installer's intent); upgrade.sh detects the
timer's pre-existing enabled state and preserves it across
version bumps.
The timer's service runs /usr/local/bin/certautopilot-backup, which:
mongodump --gzip --archiveagainst the local MongoDB using thecapAppcredentials fromsecrets.env.- Copies
/etc/certautopilot/secrets.env(andmongo-root.envwhen present) alongside the dump — the dump is useless ciphertext without the KEK, bundling them together makes restore a single-archive affair. - Writes the bundle atomically as
/var/backups/certautopilot/certautopilot-backup-<YYYYMMDD-HHMMSS>.tar.gz(mode 0600, root-owned). - Rotates archives older than
${CAP_BACKUP_RETAIN_DAYS:-7}days — override via a systemd drop-in (systemctl edit certautopilot-backup.service) addingEnvironment=CAP_BACKUP_RETAIN_DAYS=30.
Off-host storage is your responsibility — sync the directory to S3 / Azure Blob / rsync to a separate host. Local retention alone does not protect against hardware loss on the backup host itself.
03Standalone — on-demand
sudo /usr/local/bin/certautopilot-backup
Produces the same single .tar.gz under /var/backups/certautopilot/ with a fresh timestamp. Use before every upgrade and before every major config change. Installed only when --enable-backup was passed — without it, run mongodump + snapshot secrets.env manually, or re-run the bootstrap with the flag.
04Kubernetes backup
No built-in timer. Three common patterns:
Velero
Snapshots the whole namespace including the MongoDB PVC and the Secret. The backup's atomicity depends on storage-class snapshot semantics — on a CSI driver that supports it, you get a coherent point-in-time. Configure velero schedules as usual.
CronJob + mongodump + kubectl
apiVersion: batch/v1
kind: CronJob
metadata:
name: certautopilot-backup
spec:
schedule: "0 3 * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: backup
image: mongo:7.0
command:
- /bin/sh
- -c
- |
mongodump --uri="$MONGO_URI" --archive --gzip > /backup/mongo-$(date -I).archive.gz &&
kubectl get secret cap-secrets -o yaml > /backup/secret-$(date -I).yaml &&
aws s3 cp /backup/ s3://my-bucket/certautopilot/ --recursive
envFrom: [ { secretRef: { name: backup-env } } ]
volumeMounts: [ { name: scratch, mountPath: /backup } ]
volumes: [ { name: scratch, emptyDir: {} } ]
Needs RBAC for the CronJob's ServiceAccount to read the Secret. Rotate the S3 access key regularly.
External orchestration
If you already have a backup platform (Kasten, TrilioVault, Portworx), point it at the namespace. Same principle: Mongo + Secret in the same snapshot.
05Restore
Restore MongoDB
mongorestore --uri="$MONGO_URI" --drop --archive=mongo.archive.gz --gzip
The --drop wipes existing collections before restore — the safe option for a full point-in-time recovery.
Restore the secret store
- Standalone: copy the backed-up
secrets.envto/etc/certautopilot/(mode 0600, ownercertautopilot:certautopilot). - Kubernetes:
kubectl apply -f secret-backup.yaml. Pod rolling restart picks up the rendered env vars.
Restore config + TLS
Copy config.yaml back; copy tls/. Restart the service (standalone) or kubectl rollout restart (K8s).
Start the service
On first startup after a restore, BuildRegistry
loads the KEK from env vars, verifies it matches the
kek_install lock, and decrypts a sample envelope
as a sanity check. Any mismatch aborts startup with a clear
error — better than silently running on the wrong KEK.
06Restore drill
Run one quarterly:
- Spin up a scratch environment (Compose stack, empty MongoDB).
- Restore the latest backup.
- Log in with a known user.
- View a cert, trigger a renewal, watch the job complete.
- Diff the restored DB's cert count against production. Small drift (last hour's issuances) is expected; orders-of-magnitude drift means something's wrong with the backup.
Document the drill in your ops runbook. Auditors love it; production saves you next time it matters.
07Retention strategy
- Daily: last 7 days on-host, last 30 off-host.
- Weekly: 3 months.
- Monthly: 12 months.
- Align with your compliance requirement (SOC 2 often asks for 1-year retrievable; PCI-DSS is shorter).
Encrypted backups stored on an S3 bucket with server-side encryption + versioning + object lock give you both defence-in-depth (encryption at two layers) and ransomware resistance (object lock prevents deletion).
08After a KEK rotation
Backups taken before a KEK rotation are paired with the old secret-store. Keep the old secret-store alongside those backups until they age out of your retention policy — otherwise, an old-dated restore is unrecoverable. See KEK rotation.
09Troubleshooting
"KEK fingerprint mismatch" on restore
The restored Mongo expects a different KEK than the one you supplied. Either you restored an older DB with a newer secret store, or vice versa. Match their timestamps.
"Database restored but UI shows empty lists"
Wrong database name in the URI. mongodump's archive format preserves the original DB name; if your target environment uses a different name, run mongorestore --nsInclude='oldname.*' --nsFrom='oldname.*' --nsTo='newname.*'.
Mongo + secret store timestamps off by hours
Your backup script doesn't snapshot them atomically. Fix the script: always take both in the same run, preferably against a quiesced DB (stop the backend for the duration, or use Mongo's read-replica for the dump).