Distribution primitives
The reusable objects every distribution composes from: Path Sets (where files land), Action Sets (what runs after), Module Credentials (auth material), Project Variables (per-environment values), Validation Endpoints (post-deploy TLS fingerprint check), and the Fan-out execution model. Define each once; reuse across dozens of targets and certificates.
01Path sets
A path set describes where certificate files land on a target. Linux (SSH) path sets carry a POSIX path, an owner (formatted user:group), a mode (octal), and an output format. Windows (WinRM) path sets use drive-letter (C:\…) or UNC (\\server\share\…) paths; owner / mode do not apply (Windows ACLs are managed on the host).
The target_kind field selects the OS. Bind-time validation prevents cross-platform mismatches, so an SSH target cannot bind a Windows path set or vice versa.
Entry schema:
| Field | Required | Purpose |
|---|---|---|
source | yes | Which artifact to place: cert (leaf), chain (intermediates), fullchain, private_key, combined (cert + chain + key), pfx (server-built PKCS#12 — works on both Linux and Windows; see below). |
path | yes | Absolute target path. Linux: POSIX. Windows: drive-letter or UNC. |
owner | no | user:group. Linux only — stripped on Windows. |
mode | no | Octal perms (e.g. 0644). Linux only. |
format | no | pem (default) or der. DER only valid for cert or private_key. Rejected for pfx. |
passphrase_variable | iff source=pfx | Name of a secret-flagged project variable that supplies the PKCS#12 passphrase at distribute time. Only the variable name lives on the path set; the value never does. |
Example — nginx:
[
{"source": "fullchain", "path": "/etc/nginx/ssl/{{ "{{" }} .cn {{ "}}" }}.crt", "owner": "root:root", "mode": "0644"},
{"source": "private_key", "path": "/etc/nginx/ssl/{{ "{{" }} .cn {{ "}}" }}.key", "owner": "root:root", "mode": "0600"}
]
Template variables: the path is rendered as a Go template with .cn, .domains, .fingerprint, and .project_variables available. Standard Go template functions (lower, replace, printf) work.
PKCS#12 (.pfx) — server-built bundles
For consumers that expect a single PFX file (Windows Certificate Store via Import-PfxCertificate, IIS legacy bindings, modern Tomcat with keystoreType="PKCS12", Spring Boot server.ssl.key-store, .NET Core Kestrel), CertAutoPilot builds the bundle server-side and chunked-uploads the bytes directly — no openssl on the target. The encoder uses sslmate's pkcs12.Modern set (AES-256, SHA-256 HMAC).
Operator workflow:
- Settings → Variables → Add a variable, e.g.
APP_PFX_PASSPHRASE, mark Secret: yes, set the chosen passphrase. Encrypted at rest with envelope encryption. - Path Sets → Add. Pick the target OS, add a row with
source: pfx, an absolute path, and selectAPP_PFX_PASSPHRASEfrom the passphrase-variable dropdown. - Bind to a target, distribute. The PFX is built once per file; the decrypted passphrase only exists inside the build call's scope.
{
"source": "pfx",
"path": "/opt/tomcat/conf/keystore.p12",
"owner": "tomcat:tomcat",
"mode": "0600",
"passphrase_variable": "APP_PFX_PASSPHRASE"
}
The encoder injects a fresh IV/salt per encode, so the bytes change every time. The hash-based skip in both SSH and WinRM modules always reports changed for pfx files, and the bound Action Set runs on every redistribute. If you need quiet redistributes, add a thumbprint check inside your Action Set (Windows: (Get-PfxCertificate ...).Thumbprint; Linux: keytool -list -keystore ... -storepass ...) and gate the restart on a stored marker.
02Action sets
An action set is the list of commands the SSH or WinRM module runs on a target after file placement. Linux action sets run shell commands or a script body with sudo (run as), allowlisting, and timeouts. Windows action sets run PowerShell commands or an inline script via powershell.exe / pwsh.exe; run as and command_mode do not apply on Windows. target_kind selects the OS; bind-time validation enforces the match.
Commands mode (simple)
mode: commands
commands:
- "nginx -t"
- "systemctl reload nginx"
run_as: "root"
timeout_seconds: 30
Each command runs sequentially; non-zero on any aborts the rest and marks the target failed.
Script mode (complex)
mode: script
script_body: |
#!/bin/bash
set -euo pipefail
nginx -t
systemctl reload nginx
FP=$(echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -fingerprint -sha256 -noout)
[[ "$FP" == *"{{ "{{" }} .fingerprint {{ "}}" }}"* ]] || { echo "Active cert mismatch" >&2; exit 1; }
shell: "/bin/bash"
run_as: "root"
timeout_seconds: 60
Field reference
| Field | Required | Purpose |
|---|---|---|
mode | yes | commands or script (linux); on Windows: command or script_inline. |
commands | commands mode | List of shell strings. |
script_body | script mode | Full script text. Template-expanded before execution. |
shell | no | Shell to invoke. Default /bin/sh (linux), powershell on Windows. |
command_mode | no (linux only) | one-per-ssh opens a session per command; combined joins with && into a single session. |
run_as | no (linux only) | User to execute as. root routes through sudo -n. |
allowed_commands | no | Regex allowlist. The first token of every command (or every line of a script) must match one regex; non-matches are refused. |
timeout_seconds | no | Per-session timeout. Default 30. |
Template variables in actions: .cn, .domains, .fingerprint, .pkcs12_password (when a path set has a PKCS#12 entry), .project_variables, and .paths (a map of source → resolved path from the path set, e.g. {{ "{{" }} .paths.fullchain {{ "}}" }}).
Privilege escalation on Linux uses sudo -n (non-interactive). The target must allow the exact commands without a password:
# /etc/sudoers.d/certautopilot
Cmnd_Alias CAP_RELOAD = /usr/sbin/nginx -t, /bin/systemctl reload nginx
capdeploy ALL=(ALL) NOPASSWD: CAP_RELOAD
03Module credentials
A single store for every piece of auth material distribution modules need: SSH private keys, kubeconfigs, WinRM username/password pairs, F5 / NetScaler admin passwords, Vault AppRole or JWT, Huawei AK/SK pairs. All credentials live in module_credentials, sealed with envelope encryption, and are referenced by targets via immutable ID.
| Type | Fields | Used by |
|---|---|---|
| SSH private key | PEM key + optional passphrase | SSH module |
| SSH password | Username + password | SSH module |
| Kubeconfig | Inline kubeconfig YAML | Kubernetes module |
| WinRM user/pass | Domain\user + password | IIS, WinRM modules |
| F5 / NetScaler user/pass | Admin user + password | F5 BIG-IP, NetScaler |
| TLS client cert | Cert PEM + key PEM | F5, NetScaler, Vault (cert auth) |
| Vault AppRole / JWT | role_id + secret_id / JWT source | HashiCorp Vault |
| Huawei AK/SK | Access key + secret key | Huawei Cloud |
Why separate from targets: one SSH key opens ten hosts (one credential → ten targets, one rotation point); credentials only surface to admins (viewers see target lists without the auth material); credential reads and rotations get a clean audit trail.
Rotate by editing the credential and pasting new material — every future distribution picks it up. There is no version history; in-flight jobs holding a cloned client finish with the old material. Wait for in-flight to drain before considering the old material fully revoked.
Delete is blocked if any target still references the credential. The UI surfaces blockers; repoint or delete those first.
Per-credential allowlists exist for sensitive modules (e.g. Kubernetes allowed namespaces, allowed name prefix) — use them to prevent a single-tenant credential from being aimed at the wrong namespace by mistake.
04Project variables
Key-value pairs scoped to a project, referenced from path sets, action sets, webhook headers, and module configurations through Go template syntax. Use them to keep per-environment differences (hostnames, ports, service names) out of reusable templates.
Plain vs sensitive:
- Plain — readable in the UI by any project operator. Use for hostnames, port numbers, service names, feature flags.
- Sensitive — value is write-only after creation. List views show
<sensitive>. Audit log records every read by a worker. Use for shared tokens, secondary API keys, PKCS#12 passphrases (referenced viapassphrase_variableon a pfx path-set row).
Schema:
name— uppercase identifier, regex^[A-Z_][A-Z0-9_]{0,127}$. Names are trimmed and uppercased on save.value— string.is_secret— boolean.description— free-text.
Reference with {{ "{{" }} .project_variables.<NAME> {{ "}}" }}:
# Path set entry
{
"source": "fullchain",
"path": "/etc/nginx/ssl/{{ "{{" }} .project_variables.SERVICE_NAME {{ "}}" }}.crt"
}
# Action set command
commands:
- "rsync /etc/nginx/ssl/ {{ "{{" }} .project_variables.BASTION_HOST {{ "}}" }}:/mnt/backup/"
# Webhook target header
{
"X-Tenant": "{{ "{{" }} .project_variables.TENANT_ID {{ "}}" }}"
}
Pattern: one project per environment. The same path-set + action-set combination references the same variable names; the project switches what the names resolve to. production, staging, dr.
If a sensitive value lands on a target host in plaintext (written to a file, echoed into a script), the redaction pipeline stops at that boundary. Don't put irreversible-exposure secrets into action sets that echo them.
Delete is blocked if any path set, action set, or target config still references the variable. The UI surfaces references.
05Validation endpoints
A distribution that returns success from the module layer might still not be live on the endpoint — the module reload returned 0 but systemd was already running a stale copy; a Kubernetes Secret was updated but the ingress controller didn't hot-reload; an F5 profile was created but the virtual server is pinned to the old one. Post-distribution validation closes the gap: after every target completes, the backend opens an outbound TLS handshake to a configured endpoint, hashes the presented cert, and compares it to the one we just deployed. Mismatch → distribution is downgraded to partial.
Configure on the target (applies to every distribution using it) or on the distribution (overrides / supplements). Validation endpoints → Add endpoint.
| Field | Required | Purpose |
|---|---|---|
host | yes | DNS name or IP of the TLS endpoint. |
port | yes | TLS port. 443 for HTTPS, 636 for LDAPS. |
sni | no | SNI value. Defaults to the cert's CN. Set explicitly when the same endpoint serves multiple certs keyed by SNI. |
method | no | tls_fingerprint (default) or none (skip). |
retry_count | no | Default 3. Reload takes a moment. |
retry_delay_seconds | no | Default 2. |
timeout_seconds | no | Per-attempt TLS handshake timeout. Default 5 seconds. |
How fingerprint check works: dial host:port over TCP, initiate TLS with the configured SNI, read the server cert from the handshake, compute SHA-256 of its DER bytes, compare to the SHA-256 of the cert just distributed. Match → pass; mismatch or connection error → retry then fail.
SSRF guards apply: link-local and cloud-metadata IPs are blocked; DNS must resolve to a non-blocked target. Allowlist private endpoints at the network-policy layer if needed.
Partial-failure semantics: if validation fails for any endpoint on an otherwise-successful target, the per-target result moves to partial; the distribution aggregate becomes partial_failure if any child is partial. Each endpoint's outcome is recorded in DistTargetResult.ValidationResults.
06Fan-out execution
When a distribution aims at a target group larger than a configurable threshold, the backend splits it into batch child jobs, schedules them on a dedicated worker lane, classifies failures, and rolls the results up atomically. Faster wall-clock, bounded blast radius, queue fairness — ACME / MSCA / notification jobs keep running on the main lane while a 500-target SSH fan-out runs on the dist lane.
Tuning settings (Settings → General → Distribution):
| Setting | Default | Purpose |
|---|---|---|
DistributionFanoutThreshold | 10 | Target-count floor at which fan-out engages. Below this, the single-job path runs. |
DistributionFanoutBatchSize | 25 | Targets per child job. |
DistributionSSHMaxConcurrency | 8 | Inside a batch, how many SSH sessions run in parallel. |
Tune batch size up for fast targets (Kubernetes, webhook), down for slow ones (SSH reload on a busy box). Concurrency should stay well below your fleet's sustainable session cap.
Execution flow
- Enqueue. The backend chunks targets into
BatchSize, enqueues onedistribution_executechild per batch withmode=batch, then setsFanOutTotalatomically. If any enqueue fails, the entire fan-out is aborted (no partial enqueue). - Schedule. A dedicated
distWorkerpolls only fordistribution_executeanddistribution_rollback; main-queue workers ignore these. - Execute. Each batch child runs the module against every target in its slice, up to
SSHMaxConcurrencyin parallel. - Record.
RecordChildResultdoes an atomic$inc+$addToSetper target intoFanOutSucceeded/FanOutFailed/FanOutPartial/FanOutCancelled. - Per-target retry. Failed targets classified as
networkorio_transientare re-enqueued as aretry-mode child with just the retry IDs. Up to a per-target retry cap. - Aggregate completion.
TryCompleteAggregateruns a MongoDB aggregation pipeline update with$expr+$switchthat flips the distribution tosucceeded/partial_failure/failedatomically when the children counters add up toFanOutTotal. No race.
Error classification
network— connect refused, timeout, DNS failure. Retryable.io_transient— 5xx, EOF during transfer, temporary deadlock. Retryable.io_permanent— 4xx that won't change, target rejected the artifact. Not retryable.auth— credential invalid / permission denied. Not retryable.validation— post-distribution TLS fingerprint mismatch. Not retryable.
Aggregate status matrix
| Children | Distribution status |
|---|---|
All Succeeded == Total | succeeded |
Succeeded + Partial > 0 with any Failed | partial_failure |
Failed == Total | failed |
Any Cancelled with others done | cancelled |
Observe: the distribution detail page shows a progress bar reflecting FanOutSucceeded + FanOutFailed + FanOutPartial / FanOutTotal. Each batch child appears in Jobs with its own log stream. Cancel on the parent marks pending children cancelled and flips the aggregate to cancelled on completion.
07Where to manage
- Settings → Distribution → Path sets — create / edit / delete path sets.
- Settings → Distribution → Action sets — create / edit / delete action sets, with a Lint variables button to pre-expand templates against a dummy cert.
- Settings → Distribution → Credentials — module credentials (admin role).
- Settings → Variables — project variables, plain or secret.
- Targets / Distributions — bind path set + action set + credential per target; configure validation endpoints inline.
- Settings → General → Distribution — fan-out thresholds and concurrency caps.