Auth & RBAC

Four roles, two scopes, three identities — local password + TOTP, LDAP/AD federation, and API keys. Browser sessions are JWT in httpOnly cookies with double-submit CSRF and refresh-token reuse detection.

01Roles & scope

owner  ⊃  admin  ⊃  operator  ⊃  viewer

Roles are hierarchical — a higher role implies every capability of lower roles. Scopes are orthogonal:

Org role applies to every project in the organisation by default.
Project role overrides on a per-project basis. Most-permissive wins — a user who is org viewer + project operator is operator on that project, viewer elsewhere.
Org-level actions (user management, LDAP, API keys, license) ignore project roles — only the org role counts.

Role	Description
Owner	Full control. Only role that can transfer ownership, delete the organisation, or manage the license.
Admin	Manages users, projects, settings (LDAP, syslog, API keys, general policy). Cannot delete the org.
Operator	Day-to-day cert operations — issue, renew, revoke, distribute, configure ACME / DNS / modules. Cannot manage users or org settings.
Viewer	Read-only. Dashboards, cert listings, job status, audit logs. No mutations.

02Action → minimum role

The lowest role that can perform each action. Project-scoped actions check the user's project role; org-scoped actions ignore project overrides.

Action	Minimum role	Scope
Issue / renew / reissue certificate	operator	project
Revoke certificate	operator (admin if policy gates it)	project
Download private key	operator (policy-gated)	project
Delete certificate	admin	project
Manage ACME / MSCA / DNS / module config	operator	project
Certificate policy / approval policy	admin	project
Notification channels & rules	admin	project
Create / manage users & LDAP & API keys	admin	org
Create / delete projects · syslog · org-level settings	admin	org
Transfer ownership · delete org · license	owner	org

Escalation guard: a user can assign a role at or below their own. An operator cannot promote anyone to admin; an admin cannot promote anyone to owner.

03Local users

Each user record carries:

Field	Purpose
`email`	Unique per organisation. Used for login, notifications, audit correlation.
`display_name`	Shown in UI lists and audit logs.
`org_role`	`owner` / `admin` / `operator` / `viewer`.
`project_roles`	Map of project_id → role; per-project overrides.
`auth_source`	`local` (password + optional TOTP) or `ldap`.
`otp_required`	Per-user override of the org-level OTP policy.
`status`	`active` / `disabled` / `pending` (invited, not yet accepted).

Create: Settings → Users → New user. Email + display name + org role + (optional) project roles. Either email an invite token (user sets password on first login, valid 72 h) or set a temporary password.

Disable vs delete: prefer disable. Disable revokes every refresh token, blocks new logins, and is reversible. Delete is hard — audit records keep the user's id + display name at time of action for forensics, but you lose the user record itself.

Password reset: self-service via the login page or admin-forced via Settings → Users → menu → Force password reset. The admin never sees the new password.

04LDAP / AD integration

Federate against an LDAP v3 directory or Active Directory. Passwords for federated users never land in CertAutoPilot's database.

Two modes:

Simple bind — the backend binds as the user using a templated DN (uid={{ "{{" }} .username {{ "}}" }},ou=users,dc=corp,dc=example,dc=com).
Search-then-bind — bind as a service account, search for the user, then rebind as the user. Required when DN patterns vary or AD's userPrincipalName is preferred.

Setup: Settings → LDAP. Set server URL (LDAPS preferred), bind mode, bind DN pattern or service-account + search base/filter, default role on auto-provision (usually viewer), and an optional Group → role mapping table. Use the Test Connection button (verifies bind works) and the Test User Filter / Test Group Filter dialogs (search-icon next to each filter field) to verify the directory configuration before enabling.

Auto-provisioning: the first successful login creates the user record with auth_source: ldap, the role from the first matching group rule (or the configured default), and display name from the directory's cn / displayName. Group rules are re-evaluated on every login — promotion in the directory propagates on next login.

TLS: LDAPS (port 636) is preferred; STARTTLS on 389 is supported. Plain LDAP is refused unless explicitly allowed (allow_plain_ldap: true) — fix the directory instead. Paste a private CA's PEM into TLS CA certificate when needed; avoid skip verify.

Active Directory specifics: prefer (userPrincipalName={{ "{{" }} .username {{ "}}" }}) as the user filter; group membership search uses (member={{ "{{" }} .userDN {{ "}}" }}).

05Two-factor authentication (TOTP)

RFC 6238 time-based codes. Seeds are envelope-encrypted; validation accepts ±1 30-second window for clock skew. Works with both local and LDAP users — the directory never sees the OTP code.

Org policy (Settings → Users → OTP Policy card):

Required for all — every user must enroll.
Required for admin+ — admin and owner roles must enroll; operators / viewers may opt in.
Optional — users can enroll if they want.
Disabled — 2FA cannot be enabled at all.

Enrollment: scan a QR code with any RFC 6238 authenticator app (Google Authenticator, Authy, 1Password, Bitwarden), enter the current code to confirm sync, then save the 10 one-shot recovery codes shown — they are the backstop for a lost device and the backend only stores hashes.

Recovery codes are shown once

If you didn't save them, an org admin can reset OTP from Settings → Users → menu → Reset OTP (audit-logged). Otherwise regenerate from Profile → Regenerate recovery codes, which invalidates all previous codes.

Login failure returns generic "invalid credentials" — no distinction between wrong password and wrong code. Rate limit: 10 failed codes per 5 minutes triggers a 30-minute account lockout.

06API keys

Long-lived bearer tokens for CI/CD, service accounts, Terraform providers, etc. They carry an explicit role (never owner), can be project-scoped, and are stored as SHA-256(raw_key || pepper) — a DB leak alone is insufficient to impersonate a key.

Format: cap_ + 52 base32 chars. Display prefix (first 14 chars) is shown in the UI; the rest is never displayed again.

Create: Settings → API Keys → Create API Key. Name + role (viewer / operator / admin) + optional project scope + optional expiry. The raw key is shown once — copy it, it is not retrievable.

curl -H "Authorization: Bearer cap_XXXXXXXXXXXX..." \
  https://cap.example.com/api/v1/projects/$PID/certificates

Project-scoped key can only call /projects/:projectId/... for its bound project, plus a small read-only org allowlist (GET /projects, GET /auth/me, GET /license/status). Everything else → 403.
Org-wide key can call any endpoint matching its role on any project.
API key requests bypass CSRF — there are no browser cookies involved. JWT cookie sessions still require it.
Audit metadata is enriched on key-authenticated requests: auth_method=api_key, api_key_id, api_key_prefix; the actor name shows as apikey:cap_XXXXXXXXXX.

Pepper rotation: set CERTAUTOPILOT_API_KEY_PEPPER_PREVIOUS to the current pepper, set CERTAUTOPILOT_API_KEY_PEPPER to a new openssl rand -hex 32, restart the fleet. Verification tries pepper first, then previous; new keys hash with pepper only. Once every key has rotated (or expired), drop previous on a subsequent restart.

Rate limit on failed API-key auth: 10 per IP per minute → 429. Successful authentications don't count.

07Sessions, JWT & CSRF

Browser logins use JWT in httpOnly cookies. There are three tokens in play:

Access token — JWT signed HS256. httpOnly, SameSite=Strict. Default TTL 15 min (jwt.access_token_ttl).
Refresh token — opaque 32-byte random, SHA-256-hashed in the DB. httpOnly, scoped to /api/v1/auth/refresh. Default TTL 7 days.
CSRF token — random 32 bytes in a JS-readable cookie, required in X-CSRF-Token on every mutation (double-submit). GETs are exempt.

Refresh flow: the frontend's HTTP client intercepts a 401, calls POST /auth/refresh, gets a new pair, retries the original request. Refresh tokens rotate — every successful refresh mints a new refresh and marks the old one used.

Reuse detection is the firewall

If an attacker steals your refresh token and races you, one of you refreshes first — that marks the token used. When the other tries, the server sees "already used" and revokes every refresh token in the user's family. A single legitimate refresh evicts the attacker.

CSRF compares header vs cookie on every mutation; mismatch → 403. Cross-origin requests can't read cookies, so only same-origin JS can produce matching pairs.

Admin disable invalidates every refresh token for the user immediately; the user's browser fails on its next request after the access token expires (≤ 15 min). For instant eviction, revoke at the load balancer. There is currently no per-session list / per-device revoke surface in the UI — disable-the-user is the wholesale action.

Tuning:

jwt:
  access_token_ttl: 15m
  refresh_token_ttl: 168h   # 7 days

Don't push access TTL above an hour — it defeats the quick-revocation pattern. Long refresh TTL is fine because rotation + reuse-detection bound the damage; pin shorter (e.g. 24 h) if compliance requires it.

08Audit trail

Every auth-relevant action is captured in the audit log: login (success / failure), logout, role changes, password resets (initiated and completed), OTP enrollments and resets, refresh-reuse detection, API-key create / revoke / use, LDAP auto-provisions. Failed-login bursts trigger a notification rule if you've configured one (Notifications).

09Troubleshooting

"Forbidden" on an action you expected to work

Check your role for that project (not just org). The page header shows Project role: <x>. The action may require admin, but you're operator.

"Account locked"

Too many failed logins. The backend applies a default lockout (10 failures in 15 minutes → 30-minute cool-down) — values are server-side defaults and not currently exposed in the UI. Wait out the window or restart the user's session via Settings → Users → menu.

LDAP service bind fails with "invalid credentials"

Service DN or password is wrong, or the account is disabled in the directory. Test bind in the form isolates the issue.

"Code invalid" right after enrollment

Clock skew. The authenticator device must be on NTP; so must the backend host. Most phones are fine; desktop authenticators sometimes drift.

API key 401 with a key you just created

Triple-check the header: Authorization: Bearer cap_... (note the space). Copy-paste hygiene on tokens — leading / trailing whitespace, a stray newline — is the most common culprit.

403 "CSRF validation failed" from the UI

Missing or stale X-CSRF-Token. The frontend HTTP client sets it automatically; if you're calling the API directly from a script, prefer an API key (no CSRF) over JWT cookies.

"I was suddenly logged out everywhere"

Refresh-reuse detection fired. Either a real compromise (check your other devices and the audit log) or a browser replaying an old refresh after sleep. Log in again; the event is audit-logged.