Backup & restore

Two things must be backed up together: MongoDB (all the data) and the secret store (the KEK that can decrypt it). Losing either makes the other useless. Every backup strategy in this doc backs up both at the same point in time and stores them off-host.

01What to back up

ComponentStandalone pathKubernetes source
MongoDBmongodump against /etc/certautopilot/mongo-root.env usermongodump against the chart's MongoDB service
Backend secret store/etc/certautopilot/secrets.envThe K8s Secret referenced in values.yaml
TLS material (the frontend cert)/etc/certautopilot/tls/The Ingress TLS Secret
Config/etc/certautopilot/config.yamlvalues.yaml in your GitOps repo
(PKCS#11 only) HSM key materialHSM vendor's backup procedureHSM vendor's backup procedure

02Standalone — nightly automated backup

The bootstrap ships a systemd timer + service, but both are opt-in — pass --enable-backup at install time (it is off by default so operators with their own backup orchestration — restic, borg, Ansible cron, etc. — aren't surprised by a second schedule). The flag is only honored together with --mongo=local; external deployments own their backup stack regardless of the flag.

curl -fsSL https://raw.githubusercontent.com/CloudNativeWorks/certautopilot-archive/main/get.sh \
  | sudo bash -s -- --version=<pinned> --mongo=local --enable-backup

# Verify the timer is armed
systemctl list-timers | grep certautopilot

Rerunning the bootstrap without --enable-backup disables the timer (the flag is the single source of truth for the installer's intent); upgrade.sh detects the timer's pre-existing enabled state and preserves it across version bumps.

The timer's service runs /usr/local/bin/certautopilot-backup, which:

  1. mongodump --gzip --archive against the local MongoDB using the capApp credentials from secrets.env.
  2. Copies /etc/certautopilot/secrets.env (and mongo-root.env when present) alongside the dump — the dump is useless ciphertext without the KEK, bundling them together makes restore a single-archive affair.
  3. Writes the bundle atomically as /var/backups/certautopilot/certautopilot-backup-<YYYYMMDD-HHMMSS>.tar.gz (mode 0600, root-owned).
  4. Rotates archives older than ${CAP_BACKUP_RETAIN_DAYS:-7} days — override via a systemd drop-in (systemctl edit certautopilot-backup.service) adding Environment=CAP_BACKUP_RETAIN_DAYS=30.

Off-host storage is your responsibility — sync the directory to S3 / Azure Blob / rsync to a separate host. Local retention alone does not protect against hardware loss on the backup host itself.

03Standalone — on-demand

sudo /usr/local/bin/certautopilot-backup

Produces the same single .tar.gz under /var/backups/certautopilot/ with a fresh timestamp. Use before every upgrade and before every major config change. Installed only when --enable-backup was passed — without it, run mongodump + snapshot secrets.env manually, or re-run the bootstrap with the flag.

04Kubernetes backup

No built-in timer. Three common patterns:

Velero

Snapshots the whole namespace including the MongoDB PVC and the Secret. The backup's atomicity depends on storage-class snapshot semantics — on a CSI driver that supports it, you get a coherent point-in-time. Configure velero schedules as usual.

CronJob + mongodump + kubectl

apiVersion: batch/v1
kind: CronJob
metadata:
  name: certautopilot-backup
spec:
  schedule: "0 3 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
            - name: backup
              image: mongo:7.0
              command:
                - /bin/sh
                - -c
                - |
                  mongodump --uri="$MONGO_URI" --archive --gzip > /backup/mongo-$(date -I).archive.gz &&
                  kubectl get secret cap-secrets -o yaml > /backup/secret-$(date -I).yaml &&
                  aws s3 cp /backup/ s3://my-bucket/certautopilot/ --recursive
              envFrom: [ { secretRef: { name: backup-env } } ]
              volumeMounts: [ { name: scratch, mountPath: /backup } ]
          volumes: [ { name: scratch, emptyDir: {} } ]

Needs RBAC for the CronJob's ServiceAccount to read the Secret. Rotate the S3 access key regularly.

External orchestration

If you already have a backup platform (Kasten, TrilioVault, Portworx), point it at the namespace. Same principle: Mongo + Secret in the same snapshot.

05Restore

Restore MongoDB

mongorestore --uri="$MONGO_URI" --drop --archive=mongo.archive.gz --gzip

The --drop wipes existing collections before restore — the safe option for a full point-in-time recovery.

Restore the secret store

  • Standalone: copy the backed-up secrets.env to /etc/certautopilot/ (mode 0600, owner certautopilot:certautopilot).
  • Kubernetes: kubectl apply -f secret-backup.yaml. Pod rolling restart picks up the rendered env vars.

Restore config + TLS

Copy config.yaml back; copy tls/. Restart the service (standalone) or kubectl rollout restart (K8s).

Start the service

On first startup after a restore, BuildRegistry loads the KEK from env vars, verifies it matches the kek_install lock, and decrypts a sample envelope as a sanity check. Any mismatch aborts startup with a clear error — better than silently running on the wrong KEK.

06Restore drill

Run one quarterly:

  1. Spin up a scratch environment (Compose stack, empty MongoDB).
  2. Restore the latest backup.
  3. Log in with a known user.
  4. View a cert, trigger a renewal, watch the job complete.
  5. Diff the restored DB's cert count against production. Small drift (last hour's issuances) is expected; orders-of-magnitude drift means something's wrong with the backup.

Document the drill in your ops runbook. Auditors love it; production saves you next time it matters.

07Retention strategy

  • Daily: last 7 days on-host, last 30 off-host.
  • Weekly: 3 months.
  • Monthly: 12 months.
  • Align with your compliance requirement (SOC 2 often asks for 1-year retrievable; PCI-DSS is shorter).

Encrypted backups stored on an S3 bucket with server-side encryption + versioning + object lock give you both defence-in-depth (encryption at two layers) and ransomware resistance (object lock prevents deletion).

08After a KEK rotation

Backups taken before a KEK rotation are paired with the old secret-store. Keep the old secret-store alongside those backups until they age out of your retention policy — otherwise, an old-dated restore is unrecoverable. See KEK rotation.

09Troubleshooting

"KEK fingerprint mismatch" on restore

The restored Mongo expects a different KEK than the one you supplied. Either you restored an older DB with a newer secret store, or vice versa. Match their timestamps.

"Database restored but UI shows empty lists"

Wrong database name in the URI. mongodump's archive format preserves the original DB name; if your target environment uses a different name, run mongorestore --nsInclude='oldname.*' --nsFrom='oldname.*' --nsTo='newname.*'.

Mongo + secret store timestamps off by hours

Your backup script doesn't snapshot them atomically. Fix the script: always take both in the same run, preferably against a quiesced DB (stop the backend for the duration, or use Mongo's read-replica for the dump).