Distribution primitives

The reusable objects every distribution composes from: Path Sets (where files land), Action Sets (what runs after), Module Credentials (auth material), Project Variables (per-environment values), Validation Endpoints (post-deploy TLS fingerprint check), and the Fan-out execution model. Define each once; reuse across dozens of targets and certificates.

01Path sets

A path set describes where certificate files land on a target. Linux (SSH) path sets carry a POSIX path, an owner (formatted user:group), a mode (octal), and an output format. Windows (WinRM) path sets use drive-letter (C:\…) or UNC (\\server\share\…) paths; owner / mode do not apply (Windows ACLs are managed on the host).

The target_kind field selects the OS. Bind-time validation prevents cross-platform mismatches, so an SSH target cannot bind a Windows path set or vice versa.

Entry schema:

FieldRequiredPurpose
sourceyesWhich artifact to place: cert (leaf), chain (intermediates), fullchain, private_key, combined (cert + chain + key), pfx (server-built PKCS#12 — works on both Linux and Windows; see below).
pathyesAbsolute target path. Linux: POSIX. Windows: drive-letter or UNC.
ownernouser:group. Linux only — stripped on Windows.
modenoOctal perms (e.g. 0644). Linux only.
formatnopem (default) or der. DER only valid for cert or private_key. Rejected for pfx.
passphrase_variableiff source=pfxName of a secret-flagged project variable that supplies the PKCS#12 passphrase at distribute time. Only the variable name lives on the path set; the value never does.

Example — nginx:

[
  {"source": "fullchain",   "path": "/etc/nginx/ssl/{{ "{{" }} .cn {{ "}}" }}.crt", "owner": "root:root", "mode": "0644"},
  {"source": "private_key", "path": "/etc/nginx/ssl/{{ "{{" }} .cn {{ "}}" }}.key", "owner": "root:root", "mode": "0600"}
]

Template variables: the path is rendered as a Go template with .cn, .domains, .fingerprint, and .project_variables available. Standard Go template functions (lower, replace, printf) work.

PKCS#12 (.pfx) — server-built bundles

For consumers that expect a single PFX file (Windows Certificate Store via Import-PfxCertificate, IIS legacy bindings, modern Tomcat with keystoreType="PKCS12", Spring Boot server.ssl.key-store, .NET Core Kestrel), CertAutoPilot builds the bundle server-side and chunked-uploads the bytes directly — no openssl on the target. The encoder uses sslmate's pkcs12.Modern set (AES-256, SHA-256 HMAC).

Operator workflow:

  1. Settings → Variables → Add a variable, e.g. APP_PFX_PASSPHRASE, mark Secret: yes, set the chosen passphrase. Encrypted at rest with envelope encryption.
  2. Path Sets → Add. Pick the target OS, add a row with source: pfx, an absolute path, and select APP_PFX_PASSPHRASE from the passphrase-variable dropdown.
  3. Bind to a target, distribute. The PFX is built once per file; the decrypted passphrase only exists inside the build call's scope.
{
  "source": "pfx",
  "path": "/opt/tomcat/conf/keystore.p12",
  "owner": "tomcat:tomcat",
  "mode": "0600",
  "passphrase_variable": "APP_PFX_PASSPHRASE"
}
PKCS#12 idempotency caveat

The encoder injects a fresh IV/salt per encode, so the bytes change every time. The hash-based skip in both SSH and WinRM modules always reports changed for pfx files, and the bound Action Set runs on every redistribute. If you need quiet redistributes, add a thumbprint check inside your Action Set (Windows: (Get-PfxCertificate ...).Thumbprint; Linux: keytool -list -keystore ... -storepass ...) and gate the restart on a stored marker.

02Action sets

An action set is the list of commands the SSH or WinRM module runs on a target after file placement. Linux action sets run shell commands or a script body with sudo (run as), allowlisting, and timeouts. Windows action sets run PowerShell commands or an inline script via powershell.exe / pwsh.exe; run as and command_mode do not apply on Windows. target_kind selects the OS; bind-time validation enforces the match.

Commands mode (simple)

mode: commands
commands:
  - "nginx -t"
  - "systemctl reload nginx"
run_as: "root"
timeout_seconds: 30

Each command runs sequentially; non-zero on any aborts the rest and marks the target failed.

Script mode (complex)

mode: script
script_body: |
  #!/bin/bash
  set -euo pipefail
  nginx -t
  systemctl reload nginx
  FP=$(echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -fingerprint -sha256 -noout)
  [[ "$FP" == *"{{ "{{" }} .fingerprint {{ "}}" }}"* ]] || { echo "Active cert mismatch" >&2; exit 1; }
shell: "/bin/bash"
run_as: "root"
timeout_seconds: 60

Field reference

FieldRequiredPurpose
modeyescommands or script (linux); on Windows: command or script_inline.
commandscommands modeList of shell strings.
script_bodyscript modeFull script text. Template-expanded before execution.
shellnoShell to invoke. Default /bin/sh (linux), powershell on Windows.
command_modeno (linux only)one-per-ssh opens a session per command; combined joins with && into a single session.
run_asno (linux only)User to execute as. root routes through sudo -n.
allowed_commandsnoRegex allowlist. The first token of every command (or every line of a script) must match one regex; non-matches are refused.
timeout_secondsnoPer-session timeout. Default 30.

Template variables in actions: .cn, .domains, .fingerprint, .pkcs12_password (when a path set has a PKCS#12 entry), .project_variables, and .paths (a map of source → resolved path from the path set, e.g. {{ "{{" }} .paths.fullchain {{ "}}" }}).

Privilege escalation on Linux uses sudo -n (non-interactive). The target must allow the exact commands without a password:

# /etc/sudoers.d/certautopilot
Cmnd_Alias CAP_RELOAD = /usr/sbin/nginx -t, /bin/systemctl reload nginx
capdeploy ALL=(ALL) NOPASSWD: CAP_RELOAD

03Module credentials

A single store for every piece of auth material distribution modules need: SSH private keys, kubeconfigs, WinRM username/password pairs, F5 / NetScaler admin passwords, Vault AppRole or JWT, Huawei AK/SK pairs. All credentials live in module_credentials, sealed with envelope encryption, and are referenced by targets via immutable ID.

TypeFieldsUsed by
SSH private keyPEM key + optional passphraseSSH module
SSH passwordUsername + passwordSSH module
KubeconfigInline kubeconfig YAMLKubernetes module
WinRM user/passDomain\user + passwordIIS, WinRM modules
F5 / NetScaler user/passAdmin user + passwordF5 BIG-IP, NetScaler
TLS client certCert PEM + key PEMF5, NetScaler, Vault (cert auth)
Vault AppRole / JWTrole_id + secret_id / JWT sourceHashiCorp Vault
Huawei AK/SKAccess key + secret keyHuawei Cloud

Why separate from targets: one SSH key opens ten hosts (one credential → ten targets, one rotation point); credentials only surface to admins (viewers see target lists without the auth material); credential reads and rotations get a clean audit trail.

Rotate by editing the credential and pasting new material — every future distribution picks it up. There is no version history; in-flight jobs holding a cloned client finish with the old material. Wait for in-flight to drain before considering the old material fully revoked.

Delete is blocked if any target still references the credential. The UI surfaces blockers; repoint or delete those first.

Per-credential allowlists exist for sensitive modules (e.g. Kubernetes allowed namespaces, allowed name prefix) — use them to prevent a single-tenant credential from being aimed at the wrong namespace by mistake.

04Project variables

Key-value pairs scoped to a project, referenced from path sets, action sets, webhook headers, and module configurations through Go template syntax. Use them to keep per-environment differences (hostnames, ports, service names) out of reusable templates.

Plain vs sensitive:

  • Plain — readable in the UI by any project operator. Use for hostnames, port numbers, service names, feature flags.
  • Sensitive — value is write-only after creation. List views show <sensitive>. Audit log records every read by a worker. Use for shared tokens, secondary API keys, PKCS#12 passphrases (referenced via passphrase_variable on a pfx path-set row).

Schema:

  • name — uppercase identifier, regex ^[A-Z_][A-Z0-9_]{0,127}$. Names are trimmed and uppercased on save.
  • value — string.
  • is_secret — boolean.
  • description — free-text.

Reference with {{ "{{" }} .project_variables.<NAME> {{ "}}" }}:

# Path set entry
{
  "source": "fullchain",
  "path": "/etc/nginx/ssl/{{ "{{" }} .project_variables.SERVICE_NAME {{ "}}" }}.crt"
}

# Action set command
commands:
  - "rsync /etc/nginx/ssl/ {{ "{{" }} .project_variables.BASTION_HOST {{ "}}" }}:/mnt/backup/"

# Webhook target header
{
  "X-Tenant": "{{ "{{" }} .project_variables.TENANT_ID {{ "}}" }}"
}

Pattern: one project per environment. The same path-set + action-set combination references the same variable names; the project switches what the names resolve to. production, staging, dr.

Sensitive value redaction is best-effort

If a sensitive value lands on a target host in plaintext (written to a file, echoed into a script), the redaction pipeline stops at that boundary. Don't put irreversible-exposure secrets into action sets that echo them.

Delete is blocked if any path set, action set, or target config still references the variable. The UI surfaces references.

05Validation endpoints

A distribution that returns success from the module layer might still not be live on the endpoint — the module reload returned 0 but systemd was already running a stale copy; a Kubernetes Secret was updated but the ingress controller didn't hot-reload; an F5 profile was created but the virtual server is pinned to the old one. Post-distribution validation closes the gap: after every target completes, the backend opens an outbound TLS handshake to a configured endpoint, hashes the presented cert, and compares it to the one we just deployed. Mismatch → distribution is downgraded to partial.

Configure on the target (applies to every distribution using it) or on the distribution (overrides / supplements). Validation endpoints → Add endpoint.

FieldRequiredPurpose
hostyesDNS name or IP of the TLS endpoint.
portyesTLS port. 443 for HTTPS, 636 for LDAPS.
sninoSNI value. Defaults to the cert's CN. Set explicitly when the same endpoint serves multiple certs keyed by SNI.
methodnotls_fingerprint (default) or none (skip).
retry_countnoDefault 3. Reload takes a moment.
retry_delay_secondsnoDefault 2.
timeout_secondsnoPer-attempt TLS handshake timeout. Default 5 seconds.

How fingerprint check works: dial host:port over TCP, initiate TLS with the configured SNI, read the server cert from the handshake, compute SHA-256 of its DER bytes, compare to the SHA-256 of the cert just distributed. Match → pass; mismatch or connection error → retry then fail.

SSRF guards apply: link-local and cloud-metadata IPs are blocked; DNS must resolve to a non-blocked target. Allowlist private endpoints at the network-policy layer if needed.

Partial-failure semantics: if validation fails for any endpoint on an otherwise-successful target, the per-target result moves to partial; the distribution aggregate becomes partial_failure if any child is partial. Each endpoint's outcome is recorded in DistTargetResult.ValidationResults.

06Fan-out execution

When a distribution aims at a target group larger than a configurable threshold, the backend splits it into batch child jobs, schedules them on a dedicated worker lane, classifies failures, and rolls the results up atomically. Faster wall-clock, bounded blast radius, queue fairness — ACME / MSCA / notification jobs keep running on the main lane while a 500-target SSH fan-out runs on the dist lane.

Tuning settings (Settings → General → Distribution):

SettingDefaultPurpose
DistributionFanoutThreshold10Target-count floor at which fan-out engages. Below this, the single-job path runs.
DistributionFanoutBatchSize25Targets per child job.
DistributionSSHMaxConcurrency8Inside a batch, how many SSH sessions run in parallel.

Tune batch size up for fast targets (Kubernetes, webhook), down for slow ones (SSH reload on a busy box). Concurrency should stay well below your fleet's sustainable session cap.

Execution flow

  1. Enqueue. The backend chunks targets into BatchSize, enqueues one distribution_execute child per batch with mode=batch, then sets FanOutTotal atomically. If any enqueue fails, the entire fan-out is aborted (no partial enqueue).
  2. Schedule. A dedicated distWorker polls only for distribution_execute and distribution_rollback; main-queue workers ignore these.
  3. Execute. Each batch child runs the module against every target in its slice, up to SSHMaxConcurrency in parallel.
  4. Record. RecordChildResult does an atomic $inc + $addToSet per target into FanOutSucceeded / FanOutFailed / FanOutPartial / FanOutCancelled.
  5. Per-target retry. Failed targets classified as network or io_transient are re-enqueued as a retry-mode child with just the retry IDs. Up to a per-target retry cap.
  6. Aggregate completion. TryCompleteAggregate runs a MongoDB aggregation pipeline update with $expr + $switch that flips the distribution to succeeded / partial_failure / failed atomically when the children counters add up to FanOutTotal. No race.

Error classification

  • network — connect refused, timeout, DNS failure. Retryable.
  • io_transient — 5xx, EOF during transfer, temporary deadlock. Retryable.
  • io_permanent — 4xx that won't change, target rejected the artifact. Not retryable.
  • auth — credential invalid / permission denied. Not retryable.
  • validation — post-distribution TLS fingerprint mismatch. Not retryable.

Aggregate status matrix

ChildrenDistribution status
All Succeeded == Totalsucceeded
Succeeded + Partial > 0 with any Failedpartial_failure
Failed == Totalfailed
Any Cancelled with others donecancelled

Observe: the distribution detail page shows a progress bar reflecting FanOutSucceeded + FanOutFailed + FanOutPartial / FanOutTotal. Each batch child appears in Jobs with its own log stream. Cancel on the parent marks pending children cancelled and flips the aggregate to cancelled on completion.

07Where to manage

  • Settings → Distribution → Path sets — create / edit / delete path sets.
  • Settings → Distribution → Action sets — create / edit / delete action sets, with a Lint variables button to pre-expand templates against a dummy cert.
  • Settings → Distribution → Credentials — module credentials (admin role).
  • Settings → Variables — project variables, plain or secret.
  • Targets / Distributions — bind path set + action set + credential per target; configure validation endpoints inline.
  • Settings → General → Distribution — fan-out thresholds and concurrency caps.