Distribution primitives

The reusable objects every distribution composes from: Path Sets (where files land), Action Sets (what runs after), Module Credentials (auth material), Project Variables (per-environment values), Validation Endpoints (post-deploy TLS fingerprint check), and the Fan-out execution model. Define each once; reuse across dozens of targets and certificates.

01Path sets

A path set describes where certificate files land on a target. Linux (SSH) path sets carry a POSIX path, an owner (formatted user:group), a mode (octal), and an output format. Windows (WinRM) path sets use drive-letter (C:\…) or UNC (\\server\share\…) paths; owner / mode do not apply (Windows ACLs are managed on the host).

The target_kind field selects the OS. Bind-time validation prevents cross-platform mismatches, so an SSH target cannot bind a Windows path set or vice versa.

Entry schema:

Field	Required	Purpose
`source`	yes	Which artifact to place: `cert` (leaf), `chain` (intermediates), `fullchain`, `private_key`, `combined` (cert + chain + key), `pfx` (server-built PKCS#12 — works on both Linux and Windows; see below).
`path`	yes	Absolute target path. Linux: POSIX. Windows: drive-letter or UNC.
`owner`	no	`user:group`. Linux only — stripped on Windows.
`mode`	no	Octal perms (e.g. `0644`). Linux only.
`format`	no	`pem` (default) or `der`. DER only valid for `cert` or `private_key`. Rejected for `pfx`.
`passphrase_variable`	iff `source=pfx`	Name of a secret-flagged project variable that supplies the PKCS#12 passphrase at distribute time. Only the variable name lives on the path set; the value never does.

Example — nginx:

[
  {"source": "fullchain",   "path": "/etc/nginx/ssl/{{ "{{" }} .cn {{ "}}" }}.crt", "owner": "root:root", "mode": "0644"},
  {"source": "private_key", "path": "/etc/nginx/ssl/{{ "{{" }} .cn {{ "}}" }}.key", "owner": "root:root", "mode": "0600"}
]

Template variables: the path is rendered as a Go template with .cn, .domains, .fingerprint, and .project_variables available. Standard Go template functions (lower, replace, printf) work.

PKCS#12 (.pfx) — server-built bundles

For consumers that expect a single PFX file (Windows Certificate Store via Import-PfxCertificate, IIS legacy bindings, modern Tomcat with keystoreType="PKCS12", Spring Boot server.ssl.key-store, .NET Core Kestrel), CertAutoPilot builds the bundle server-side and chunked-uploads the bytes directly — no openssl on the target. The encoder uses sslmate's pkcs12.Modern set (AES-256, SHA-256 HMAC).

Operator workflow:

Settings → Variables → Add a variable, e.g. APP_PFX_PASSPHRASE, mark Secret: yes, set the chosen passphrase. Encrypted at rest with envelope encryption.
Path Sets → Add. Pick the target OS, add a row with source: pfx, an absolute path, and select APP_PFX_PASSPHRASE from the passphrase-variable dropdown.
Bind to a target, distribute. The PFX is built once per file; the decrypted passphrase only exists inside the build call's scope.

{
  "source": "pfx",
  "path": "/opt/tomcat/conf/keystore.p12",
  "owner": "tomcat:tomcat",
  "mode": "0600",
  "passphrase_variable": "APP_PFX_PASSPHRASE"
}

PKCS#12 idempotency caveat

The encoder injects a fresh IV/salt per encode, so the bytes change every time. The hash-based skip in both SSH and WinRM modules always reports changed for pfx files, and the bound Action Set runs on every redistribute. If you need quiet redistributes, add a thumbprint check inside your Action Set (Windows: (Get-PfxCertificate ...).Thumbprint; Linux: keytool -list -keystore ... -storepass ...) and gate the restart on a stored marker.

02Action sets

An action set is the list of commands the SSH or WinRM module runs on a target after file placement. Linux action sets run shell commands or a script body with sudo (run as), allowlisting, and timeouts. Windows action sets run PowerShell commands or an inline script via powershell.exe / pwsh.exe; run as and command_mode do not apply on Windows. target_kind selects the OS; bind-time validation enforces the match.

Commands mode (simple)

mode: commands
commands:
  - "nginx -t"
  - "systemctl reload nginx"
run_as: "root"
timeout_seconds: 30

Each command runs sequentially; non-zero on any aborts the rest and marks the target failed.

Script mode (complex)

mode: script
script_body: |
  #!/bin/bash
  set -euo pipefail
  nginx -t
  systemctl reload nginx
  FP=$(echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -fingerprint -sha256 -noout)
  [[ "$FP" == *"{{ "{{" }} .fingerprint {{ "}}" }}"* ]] || { echo "Active cert mismatch" >&2; exit 1; }
shell: "/bin/bash"
run_as: "root"
timeout_seconds: 60

Field reference

Field	Required	Purpose
`mode`	yes	`commands` or `script` (linux); on Windows: `command` or `script_inline`.
`commands`	commands mode	List of shell strings.
`script_body`	script mode	Full script text. Template-expanded before execution.
`shell`	no	Shell to invoke. Default `/bin/sh` (linux), `powershell` on Windows.
`command_mode`	no (linux only)	`one-per-ssh` opens a session per command; `combined` joins with `&&` into a single session.
`run_as`	no (linux only)	User to execute as. `root` routes through `sudo -n`.
`allowed_commands`	no	Regex allowlist. The first token of every command (or every line of a script) must match one regex; non-matches are refused.
`timeout_seconds`	no	Per-session timeout. Default 30.

Template variables in actions: .cn, .domains, .fingerprint, .pkcs12_password (when a path set has a PKCS#12 entry), .project_variables, and .paths (a map of source → resolved path from the path set, e.g. {{ "{{" }} .paths.fullchain {{ "}}" }}).

Privilege escalation on Linux uses sudo -n (non-interactive). The target must allow the exact commands without a password:

# /etc/sudoers.d/certautopilot
Cmnd_Alias CAP_RELOAD = /usr/sbin/nginx -t, /bin/systemctl reload nginx
capdeploy ALL=(ALL) NOPASSWD: CAP_RELOAD

03Module credentials

A single store for every piece of auth material distribution modules need: SSH private keys, kubeconfigs, WinRM username/password pairs, F5 / NetScaler admin passwords, Vault AppRole or JWT, Huawei AK/SK pairs. All credentials live in module_credentials, sealed with envelope encryption, and are referenced by targets via immutable ID.

Type	Fields	Used by
SSH private key	PEM key + optional passphrase	SSH module
SSH password	Username + password	SSH module
Kubeconfig	Inline kubeconfig YAML	Kubernetes module
WinRM user/pass	Domain\user + password	IIS, WinRM modules
F5 / NetScaler user/pass	Admin user + password	F5 BIG-IP, NetScaler
TLS client cert	Cert PEM + key PEM	F5, NetScaler, Vault (cert auth)
Vault AppRole / JWT	`role_id` + `secret_id` / JWT source	HashiCorp Vault
Huawei AK/SK	Access key + secret key	Huawei Cloud

Why separate from targets: one SSH key opens ten hosts (one credential → ten targets, one rotation point); credentials only surface to admins (viewers see target lists without the auth material); credential reads and rotations get a clean audit trail.

Rotate by editing the credential and pasting new material — every future distribution picks it up. There is no version history; in-flight jobs holding a cloned client finish with the old material. Wait for in-flight to drain before considering the old material fully revoked.

Delete is blocked if any target still references the credential. The UI surfaces blockers; repoint or delete those first.

Per-credential allowlists exist for sensitive modules (e.g. Kubernetes allowed namespaces, allowed name prefix) — use them to prevent a single-tenant credential from being aimed at the wrong namespace by mistake.

04Project variables

Key-value pairs scoped to a project, referenced from path sets, action sets, webhook headers, and module configurations through Go template syntax. Use them to keep per-environment differences (hostnames, ports, service names) out of reusable templates.

Plain vs sensitive:

Plain — readable in the UI by any project operator. Use for hostnames, port numbers, service names, feature flags.
Sensitive — value is write-only after creation. List views show <sensitive>. Audit log records every read by a worker. Use for shared tokens, secondary API keys, PKCS#12 passphrases (referenced via passphrase_variable on a pfx path-set row).

Schema:

name — uppercase identifier, regex ^[A-Z_][A-Z0-9_]{0,127}$. Names are trimmed and uppercased on save.
value — string.
is_secret — boolean.
description — free-text.

Reference with {{ "{{" }} .project_variables.<NAME> {{ "}}" }}:

# Path set entry
{
  "source": "fullchain",
  "path": "/etc/nginx/ssl/{{ "{{" }} .project_variables.SERVICE_NAME {{ "}}" }}.crt"
}

# Action set command
commands:
  - "rsync /etc/nginx/ssl/ {{ "{{" }} .project_variables.BASTION_HOST {{ "}}" }}:/mnt/backup/"

# Webhook target header
{
  "X-Tenant": "{{ "{{" }} .project_variables.TENANT_ID {{ "}}" }}"
}

Pattern: one project per environment. The same path-set + action-set combination references the same variable names; the project switches what the names resolve to. production, staging, dr.

Sensitive value redaction is best-effort

If a sensitive value lands on a target host in plaintext (written to a file, echoed into a script), the redaction pipeline stops at that boundary. Don't put irreversible-exposure secrets into action sets that echo them.

Delete is blocked if any path set, action set, or target config still references the variable. The UI surfaces references.

05Validation endpoints

A distribution that returns success from the module layer might still not be live on the endpoint — the module reload returned 0 but systemd was already running a stale copy; a Kubernetes Secret was updated but the ingress controller didn't hot-reload; an F5 profile was created but the virtual server is pinned to the old one. Post-distribution validation closes the gap: after every target completes, the backend opens an outbound TLS handshake to a configured endpoint, hashes the presented cert, and compares it to the one we just deployed. Mismatch → distribution is downgraded to partial.

Configure on the target (applies to every distribution using it) or on the distribution (overrides / supplements). Validation endpoints → Add endpoint.

Field	Required	Purpose
`host`	yes	DNS name or IP of the TLS endpoint.
`port`	yes	TLS port. `443` for HTTPS, `636` for LDAPS.
`sni`	no	SNI value. Defaults to the cert's CN. Set explicitly when the same endpoint serves multiple certs keyed by SNI.
`method`	no	`tls_fingerprint` (default) or `none` (skip).
`retry_count`	no	Default `3`. Reload takes a moment.
`retry_delay_seconds`	no	Default `2`.
`timeout_seconds`	no	Per-attempt TLS handshake timeout. Default `5` seconds.

How fingerprint check works: dial host:port over TCP, initiate TLS with the configured SNI, read the server cert from the handshake, compute SHA-256 of its DER bytes, compare to the SHA-256 of the cert just distributed. Match → pass; mismatch or connection error → retry then fail.

SSRF guards apply: link-local and cloud-metadata IPs are blocked; DNS must resolve to a non-blocked target. Allowlist private endpoints at the network-policy layer if needed.

Partial-failure semantics: if validation fails for any endpoint on an otherwise-successful target, the per-target result moves to partial; the distribution aggregate becomes partial_failure if any child is partial. Each endpoint's outcome is recorded in DistTargetResult.ValidationResults.

06Fan-out execution

When a distribution aims at a target group larger than a configurable threshold, the backend splits it into batch child jobs, schedules them on a dedicated worker lane, classifies failures, and rolls the results up atomically. Faster wall-clock, bounded blast radius, queue fairness — ACME / MSCA / notification jobs keep running on the main lane while a 500-target SSH fan-out runs on the dist lane.

Tuning settings (Settings → General → Distribution):

Setting	Default	Purpose
`DistributionFanoutThreshold`	10	Target-count floor at which fan-out engages. Below this, the single-job path runs.
`DistributionFanoutBatchSize`	25	Targets per child job.
`DistributionSSHMaxConcurrency`	8	Inside a batch, how many SSH sessions run in parallel.

Tune batch size up for fast targets (Kubernetes, webhook), down for slow ones (SSH reload on a busy box). Concurrency should stay well below your fleet's sustainable session cap.

Execution flow

Enqueue. The backend chunks targets into BatchSize, enqueues one distribution_execute child per batch with mode=batch, then sets FanOutTotal atomically. If any enqueue fails, the entire fan-out is aborted (no partial enqueue).
Schedule. A dedicated distWorker polls only for distribution_execute and distribution_rollback; main-queue workers ignore these.
Execute. Each batch child runs the module against every target in its slice, up to SSHMaxConcurrency in parallel.
Record. RecordChildResult does an atomic $inc + $addToSet per target into FanOutSucceeded / FanOutFailed / FanOutPartial / FanOutCancelled.
Per-target retry. Failed targets classified as network or io_transient are re-enqueued as a retry-mode child with just the retry IDs. Up to a per-target retry cap.
Aggregate completion. TryCompleteAggregate runs a MongoDB aggregation pipeline update with $expr + $switch that flips the distribution to succeeded / partial_failure / failed atomically when the children counters add up to FanOutTotal. No race.

Error classification

network — connect refused, timeout, DNS failure. Retryable.
io_transient — 5xx, EOF during transfer, temporary deadlock. Retryable.
io_permanent — 4xx that won't change, target rejected the artifact. Not retryable.
auth — credential invalid / permission denied. Not retryable.
validation — post-distribution TLS fingerprint mismatch. Not retryable.

Aggregate status matrix

Children	Distribution status
All `Succeeded == Total`	`succeeded`
`Succeeded + Partial > 0` with any `Failed`	`partial_failure`
`Failed == Total`	`failed`
Any `Cancelled` with others done	`cancelled`

Observe: the distribution detail page shows a progress bar reflecting FanOutSucceeded + FanOutFailed + FanOutPartial / FanOutTotal. Each batch child appears in Jobs with its own log stream. Cancel on the parent marks pending children cancelled and flips the aggregate to cancelled on completion.

07Where to manage

Settings → Distribution → Path sets — create / edit / delete path sets.
Settings → Distribution → Action sets — create / edit / delete action sets, with a Lint variables button to pre-expand templates against a dummy cert.
Settings → Distribution → Credentials — module credentials (admin role).
Settings → Variables — project variables, plain or secret.
Targets / Distributions — bind path set + action set + credential per target; configure validation endpoints inline.
Settings → General → Distribution — fan-out thresholds and concurrency caps.