What it defends against — and, explicitly, what it does not.
A control plane is only as credible as its stated adversary. Legation governs an operator's agent/MCP workload running inside a customer's VPC. Three parties are mutually distrustful: the operator (owns the IP), the customer (owns the data and the soil), and the routing hub (carries the traffic). The design assumes any of them can be curious or compromised.
In scope — defended by construction
• Data exfiltration by the agent, the model, or the operator — the inference never sees raw data; only structure-only projections cross the membrane.
• Prompt injection / a hijacked-but-in-bounds agent — held to a plan signed before untrusted input was processed.
• A passive or active hub — routes ciphertext it holds no key to read.
• Rug-pulled tools, replayed commands, leaked credentials — integrity hashing, per-envelope nonces, per-call proof-of-possession.
• Tampered history — an append-only hash chain an independent witness co-signs.
Out of scope — stated plainly
• Correctness of an in-bounds action. Containment bounds the blast radius; it does not make a permitted action right. (Mitigated, not eliminated — see §10.)
• A malicious host below a TEE. Operator IP is hardware-protected only at the Sovereign tier; below it, by obfuscation + contract.
• Traffic analysis. The hub is zero-knowledge on content, not metadata. ZK ≠ anonymity.
• Bugs inside the app's own authorized processing. Legation contains and governs a workload; it does not audit its internals.
A guarantee that doesn't name its limits is a slogan. Every limit above is also in GAPS-AND-LIMITS.md in the source tree, because an assessor will find them — better that they read them from us first.
The brain moves to the data; only a projection comes back.
Conventional designs move data to the model. Legation inverts it: the brain (LLM + the operator's skill IP) stays control-side; the hands (the in-VPC enclave) execute next to the data. Raw bytes are read only inside the execute step and never leave it. What crosses the membrane is a redacted, structure-only projection plus a metadata-only audit record — never content.
ToolCall ─▶ ① admit verify the sealed bag offline (SHA-384 Merkle + Ed25519) ② recall check dual-asymmetric kill switch + sensor self-sever → fail closed ③ permission gate treaty ceiling (Seneschal 30-check / local allowlist) ④ intent gate action ∈ the signed per-task mandate? (commit-then-act) ⑤ active defense honeytoken + behavioral-rate sensors → tripwire? ╌╌╌╌╌╌╌╌╌╌╌╌╌╌ the membrane ╌╌╌╌╌╌╌╌╌╌╌╌╌╌ ⑥ execute IN-VPC raw data read here, and ONLY here ⑦ project structure-only + covert-channel budget ⑧ audit metadata only — verdict, timestamp, content HASH (never content) » a Block at ②–⑤ never reaches ⑥. The injected action never touches a byte.
Steps ②–⑤ are pure, deterministic, air-gap-safe — no network, no model, no clock held internally (timestamps are supplied at the boundary so the path is reproducible and testable). The enforcement decision is the same in a SCIF as on a developer's laptop.
Containment, then intent, then an immune response.
The enclave does not merely prevent out-of-bounds actions. It holds an agent to the plan it committed to, and it reacts to compromise that stays in bounds. Three independent layers, composed.
Containment
The treaty — a both-sides-signed contract — is the standing ceiling: the tools an agent may ever call, its autonomy ceiling, its data classification. The membrane + the gate enforce it. The agent cannot exfiltrate and cannot exceed the treaty.
Intent — commit-then-act
The standing treaty answers "may it ever?" — not "should it, for this task, now?". So the brain declares its plan as a signed mandate before processing any untrusted input. The in-VPC gate freezes the agent to that plan. An action a mid-task injection introduces is outside the committed mandate and is refused — deterministically, before it touches data.
Active defense — the immune response
Detection, not just prevention. Honeytokens (bait carried in the signed treaty) and a behavioral rate baseline watch every action. A tripwire self-severs the embassy — fail-closed, before execute — and the operator cannot override it.
§3.1 — Intent conformance, precisely: a warrant narrower than the contract.
Two authorization tiers, one hard invariant: mandate ⊆ treaty, always. A mandate can only ever narrow the treaty, never widen it; admission rejects any clause outside the treaty scope. The permission gate stays the ceiling; the conformance check runs behind it.
| Property | Treaty (standing) | Mandate (per task) |
|---|---|---|
| Authored by | both sides, compiled | the brain's declared plan |
| Lifetime | the engagement | one task / session |
| Signed | before delivery | before any untrusted input is read |
| Expresses | tool allowlist, ceiling | per-clause tool + target + param + sequence + use bounds |
| Role | the ceiling | the warrant inside it |
The mandate is compiled control-side and bound to the enclave by an Ed25519 signature over a domain-separated, canonical SHA-384 claims hash. That canonical hash is reproduced byte-for-byte by an independent implementation in the design-time compiler — and the cross-language vector is pinned in both test suites. If the two ever drift, the build breaks.
committed plan: extract → summarize (signed before untrusted input) » extract patient/… ✓ conforms → executes » summarize ✓ conforms → executes » delete_record patient/42 ⛔ TREATY-LEGAL, but OUTSIDE the committed plan refused BEFORE execute — the byte is never touched
§3.2 — Active defense, precisely: bait in the contract, baseline on the agent.
Honeytokens are planted in the data and listed in the signed treaty — so a deployed embassy arms its sensors from the contract (changing the bait breaks the seal), and the agent never reads the list. A single touch is a near-zero-false-positive compromise signal. The rate baseline tracks action volume against an online mean/variance (Welford), with a variance floor so a perfectly flat baseline still detects a burst, and a hard ceiling for cold start. The baseline can be fitted from the in-VPC behavioral sensor's established profile rather than guessed.
» extract intake.py ✓ conforms → executes » run_assurance ✓ conforms → executes » (injected) extract patient/HONEYTOKEN-canary-7f3a9c 🚨 ACTIVE-DEFENSE TRIPWIRE — embassy SELF-SEVERED before execute » (recovery attempt) extract intake.py 🛑 refused — embassy is self-severed (fail-closed) 🔒 the operator cannot override the self-sever
CNSA 2.0, domain-separated, and backend-swappable.
Every hash is SHA-384; every classical signature is Ed25519; the symmetric MAC is HMAC-SHA384. Hashing is domain-separated by role (distinct prefixes for leaves, nodes, roots, claims, challenges) so no value computed for one purpose can be reinterpreted as another.
| Construction | Primitive | What it guarantees |
|---|---|---|
| Sealed-bag accreditation | SHA-384 Merkle root, Ed25519 over a domain-separated root | offline integrity + authenticity of exactly what was delivered, verified before the policy loads — the verify recomputes the root from the accreditation, so a tampered carried digest is rejected (regression-tested) |
| Mandate / identity claims | Ed25519 over canonical SHA-384 claims | tamper-evident, replay-bound, byte-reproducible across implementations |
| Sealing ingress (the conduit) | ephemeral X25519 ECDH → HKDF-SHA384 → ChaCha20-Poly1305 | anonymous sealed box — only the destination key opens it; the hub is blind by cryptography, not policy |
| Transparency chain | SHA-384 hash chain + Ed25519 checkpoints + witness co-signatures | append-only history no operator can rewrite without an independent witness noticing |
| Sovereign tier | hybrid: X25519+ML-KEM-1024 (FIPS 203), Ed25519+ML-DSA-87 (FIPS 204) | secure if either primitive holds — kills harvest-now-decrypt-later |
The provider seam (FIPS path)
All governance hashing and signing routes through one abstraction with two interchangeable backends: a pure-Rust default and a FIPS 140-3-capable validated module. Both produce byte-identical output (standardized algorithms), so the backend swaps without breaking a single signature or hash contract — verified by the same known-answer and golden vectors passing under each. The build attests its own crypto posture; the compliance surface reads that attestation rather than asserting it.
Replay & freshness
Every sealed envelope carries a per-envelope nonce admitted through a memory-bounded freshness window; heartbeat sequences must strictly increase; per-call proof-of-possession challenges are action-bound and single-use. The freshness window is a tier-dialed, sealed knob — tighter as the dial turns toward Sovereign — visible in the accreditation record so an auditor reads the exact enforced value.
A subtlety we handle and most don't: the Merkle construction's odd-node promotion is the classic duplicate-leaf ambiguity class (CVE-2012-2459). It is not exploitable here — the leaf set is a fixed, role-domain-separated shape re-derived from the same inputs at verify — and the reasoning is documented at the call site, not assumed.
A leaked credential is not an identity.
Each agent carries a SPIFFE-aligned, short-lived, scoped, attested non-human identity. Verifying the credential proves the issuer signed it — it does not prove the holder owns the key. So every privileged action requires a fresh, action-bound, replay-guarded proof of possession: the holder signs a per-call challenge with its private key, checked before the gate, the mandate, the defense, or execute. A captured credential blob alone is rejected. Identities rotate generate-swap-revoke; revocation is checked per call.
» genuine agent: in-scope call + valid fresh proof → executes » thief w/ the credential but the WRONG key → IdentityRejected (before execute) » replayed proof (same nonce) → IdentityRejected » proof minted for a different action → IdentityRejected
Metadata-only, tamper-evident, and witnessed by a third party.
Every action — and every refusal — is recorded as a metadata-only decision record: sequence, verdict, a boundary timestamp, the reason it was refused, and a SHA-384 hash of any raw result, never the result. The trace includes the actions that were blocked, which is exactly the explainability regulators and insurers ask for.
Append-only chain
Records hash into a SHA-384 chain whose head an independent party can recompute. Alter one entry and the head no longer reproduces.
Independent witness
The operator publishes signed checkpoints; an independent witness co-signs the same head. A customer verifies the witness against the witness's own published key — confirming the operator didn't cheat without trusting the operator.
Covert channel, measured
The projection is a side channel. Rather than wave it away, the channel's capacity is quantified (analytic upper bound + demonstrated lower bound) and throttled to a configured, logged ceiling. The honest residual is stated: structure cargo's symbol names are the payload the customer asked to see.
"Verify, don't trust" is a property here, not a tagline. The read-only Attestation Portal hands an auditor a signed artifact for every claim and the command to recompute it offline — it asserts nothing it cannot hand you the bytes for.
One dial reconfigures the whole posture.
The same sealed artifact runs at three tiers; turning the dial changes where inference runs, how the control link degrades, key custody, the crypto suite, and egress — pinned stricter, never looser, and enforced in code (an operator may tighten a knob but never loosen it past the tier ceiling).
| Bolt-on | Regulated | Sovereign | |
|---|---|---|---|
| Brain location | control-plane (inverted) | customer endpoint (BYO) | in-VPC local model |
| Control link | live dial-out | live + store-and-forward | offline signed artifacts |
| Key custody | software | HSM / KMS | hardware-bound |
| Crypto suite | classical CNSA 2.0 | classical CNSA 2.0 (FIPS) | hybrid post-quantum |
| Egress (NetworkPolicy) | DNS + control plane :443 | DNS + control plane :443 | default-deny (air-gap) |
| Replay window | 300 s | 120 s | 60 s |
Zero inbound ports. A scratch image. A read-only root.
Dial-out, not dial-in
The embassy dials out to the hub over an authenticated WebSocket (wss) — there are no inbound ports in the customer VPC. To register, the embassy cryptographically authenticates to the hub: an Ed25519/SHA-384 signature checked against a trusted-key allowlist, inside a freshness window, with a single-use replay nonce — so the hub admits only known embassies, and it is blind both ways. This removes the single hardest step in any on-prem install (a firewall exception) and shrinks the attack surface to nothing inbound. Commands ride the dial-out channel; results and metadata-only telemetry return the same way. The NetworkPolicy denies all ingress.
L6 image, STIG-mapped
FROM scratch, a single static-musl binary (RELRO+NX, no shell, no libc, no package manager), runs non-root with a read-only root filesystem, dropped capabilities, a seccomp profile, and no auto-mounted service-account token. Secrets are mounted at runtime, never baked into a layer. Each rule maps to a DISA STIG control, exported as a .ckl with an honest POA&M for the open items.
The whole assessor package, derived from the sealed delivery.
SSP + matrix
System Security Plan with the authorization boundary, a data-flow diagram, and a shared-responsibility matrix (operator / customer / cloud, per control family).
STIG .ckl + POA&M
DISA checklist from the real hardening, honest about its Opens (FIPS module, human MFA) rather than overclaiming.
OSCAL
Machine-readable component-definition and assessment-results (live per-control status) the buyer's GRC ingests directly.
AIBOM + SPRS
A signed AI Bill of Materials bound to the seal root, plus an honest in-scope SPRS contribution.
Where the guarantees stop. (Read this one twice.)
An engineer trusts a system that knows its own edges. These are stated in the source tree, not buried.
Containment ≠ correctness
Within its committed plan, an injected or buggy agent can still take a permitted-but-wrong action. The mandate shrinks the residual from the whole treaty to one frozen task; it does not zero it. Tight mandates + the active-defense sensors + the insurance layer cover the remainder.
Metadata, not anonymity
The hub is zero-knowledge on content, not on metadata — it sees who talks to whom, when, how much. Traffic analysis is possible.
IP-from-host only at the TEE
Data-flow inversion protects the customer's data from the operator at every tier. The operator's IP is protected from a malicious host by hardware only at Sovereign; below it, by compiled-binary obscurity + contract.
FIPS module pending a certificate
The crypto seam is built and proven byte-identical against a validated module — but module validation is a CMVP certificate from a lab, not a line of code. Reported as Open, never asserted as done.
What's deliberately not here.
This reference describes the architecture and the standardized cryptographic constructions to the edge of what's safe to publish. Beyond that edge — and excluded on purpose — are the internals of the post-quantum sovereign transport, the exact structured-extraction and projection heuristics that shape what a projection reveals, and the operator-side runbooks. Those are trade secrets, available under NDA to a serious evaluator. Their absence here is a design decision, not an omission.
Everything described above is real, tested, and deterministic. Everything below the line is real too — it just isn't on a public page. A pilot evaluator gets both.