Legation — Technical Reference: the enforcement architecture

§1 — Threat model

What it defends against — and, explicitly, what it does not.

A control plane is only as credible as its stated adversary. Legation governs an operator's agent/MCP workload running inside a customer's VPC. Three parties are mutually distrustful: the operator (owns the IP), the customer (owns the data and the soil), and the routing hub (carries the traffic). The design assumes any of them can be curious or compromised.

In scope — defended by construction

• Data exfiltration by the agent, the model, or the operator — the inference never sees raw data; only structure-only projections cross the membrane.
• Prompt injection / a hijacked-but-in-bounds agent — held to a plan signed before untrusted input was processed.
• A passive or active hub — routes ciphertext it holds no key to read.
• Rug-pulled tools, replayed commands, leaked credentials — integrity hashing, per-envelope nonces, per-call proof-of-possession.
• Tampered history — an append-only hash chain an independent witness co-signs.

Out of scope — stated plainly

• Correctness of an in-bounds action. Containment bounds the blast radius; it does not make a permitted action right. (Mitigated, not eliminated — see §10.)
• A malicious host below a TEE. Operator IP is hardware-protected only at the Sovereign tier; below it, by obfuscation + contract.
• Traffic analysis. The hub is zero-knowledge on content, not metadata. ZK ≠ anonymity.
• Bugs inside the app's own authorized processing. Legation contains and governs a workload; it does not audit its internals.

A guarantee that doesn't name its limits is a slogan. Every limit above is also in GAPS-AND-LIMITS.md in the source tree, because an assessor will find them — better that they read them from us first.

§2 — The membrane (data-flow inversion)

The brain moves to the data; only a projection comes back.

Conventional designs move data to the model. Legation inverts it: the brain (LLM + the operator's skill IP) stays control-side; the hands (the in-VPC enclave) execute next to the data. Raw bytes are read only inside the execute step and never leave it. What crosses the membrane is a redacted, structure-only projection plus a metadata-only audit record — never content.

per-action pipeline — every tool call traverses this, in order

ToolCall ─▶  ① admit          verify the sealed bag offline (SHA-384 Merkle + Ed25519)
            ② recall check    dual-asymmetric kill switch + sensor self-sever → fail closed
            ③ permission gate treaty ceiling (Seneschal 30-check / local allowlist)
            ④ intent gate     action ∈ the signed per-task mandate?  (commit-then-act)
            ⑤ active defense  honeytoken + behavioral-rate sensors → tripwire?
            ╌╌╌╌╌╌╌╌╌╌╌╌╌╌  the membrane  ╌╌╌╌╌╌╌╌╌╌╌╌╌╌
            ⑥ execute IN-VPC  raw data read here, and ONLY here
            ⑦ project         structure-only + covert-channel budget
            ⑧ audit           metadata only — verdict, timestamp, content HASH (never content)
» a Block at ②–⑤ never reaches ⑥. The injected action never touches a byte.

Steps ②–⑤ are pure, deterministic, air-gap-safe — no network, no model, no clock held internally (timestamps are supplied at the boundary so the path is reproducible and testable). The enforcement decision is the same in a SCIF as on a developer's laptop.

§3 — Three enforcement layers

Containment, then intent, then an immune response.

The enclave does not merely prevent out-of-bounds actions. It holds an agent to the plan it committed to, and it reacts to compromise that stays in bounds. Three independent layers, composed.

Containment

The treaty — a both-sides-signed contract — is the standing ceiling: the tools an agent may ever call, its autonomy ceiling, its data classification. The membrane + the gate enforce it. The agent cannot exfiltrate and cannot exceed the treaty.

Intent — commit-then-act

The standing treaty answers "may it ever?" — not "should it, for this task, now?". So the brain declares its plan as a signed mandate before processing any untrusted input. The in-VPC gate freezes the agent to that plan. An action a mid-task injection introduces is outside the committed mandate and is refused — deterministically, before it touches data.

Active defense — the immune response

Detection, not just prevention. Honeytokens (bait carried in the signed treaty) and a behavioral rate baseline watch every action. A tripwire self-severs the embassy — fail-closed, before execute — and the operator cannot override it.

§3.1 — Intent conformance, precisely: a warrant narrower than the contract.

Two authorization tiers, one hard invariant: mandate ⊆ treaty, always. A mandate can only ever narrow the treaty, never widen it; admission rejects any clause outside the treaty scope. The permission gate stays the ceiling; the conformance check runs behind it.

Property	Treaty (standing)	Mandate (per task)
Authored by	both sides, compiled	the brain's declared plan
Lifetime	the engagement	one task / session
Signed	before delivery	before any untrusted input is read
Expresses	tool allowlist, ceiling	per-clause tool + target + param + sequence + use bounds
Role	the ceiling	the warrant inside it

The mandate is compiled control-side and bound to the enclave by an Ed25519 signature over a domain-separated, canonical SHA-384 claims hash. That canonical hash is reproduced byte-for-byte by an independent implementation in the design-time compiler — and the cross-language vector is pinned in both test suites. If the two ever drift, the build breaks.

commit-then-act — an injected, treaty-LEGAL action refused

committed plan: extract → summarize        (signed before untrusted input)
» extract patient/…        ✓ conforms → executes
» summarize                ✓ conforms → executes
» delete_record patient/42  ⛔ TREATY-LEGAL, but OUTSIDE the committed plan
                               refused BEFORE execute — the byte is never touched

§3.2 — Active defense, precisely: bait in the contract, baseline on the agent.

Honeytokens are planted in the data and listed in the signed treaty — so a deployed embassy arms its sensors from the contract (changing the bait breaks the seal), and the agent never reads the list. A single touch is a near-zero-false-positive compromise signal. The rate baseline tracks action volume against an online mean/variance (Welford), with a variance floor so a perfectly flat baseline still detects a burst, and a hard ceiling for cold start. The baseline can be fitted from the in-VPC behavioral sensor's established profile rather than guessed.

immune response — honeytoken touched → self-sever

» extract intake.py                 ✓ conforms → executes
» run_assurance                     ✓ conforms → executes
» (injected) extract patient/HONEYTOKEN-canary-7f3a9c
   🚨 ACTIVE-DEFENSE TRIPWIRE — embassy SELF-SEVERED before execute
» (recovery attempt) extract intake.py
   🛑 refused — embassy is self-severed (fail-closed)
   🔒 the operator cannot override the self-sever

§4 — Cryptographic constructions

CNSA 2.0, domain-separated, and backend-swappable.

Every hash is SHA-384; every classical signature is Ed25519; the symmetric MAC is HMAC-SHA384. Hashing is domain-separated by role (distinct prefixes for leaves, nodes, roots, claims, challenges) so no value computed for one purpose can be reinterpreted as another.

Construction	Primitive	What it guarantees
Sealed-bag accreditation	SHA-384 Merkle root, Ed25519 over a domain-separated root	offline integrity + authenticity of exactly what was delivered, verified before the policy loads — the verify recomputes the root from the accreditation, so a tampered carried digest is rejected (regression-tested)
Mandate / identity claims	Ed25519 over canonical SHA-384 claims	tamper-evident, replay-bound, byte-reproducible across implementations
Sealing ingress (the conduit)	ephemeral X25519 ECDH → HKDF-SHA384 → ChaCha20-Poly1305	anonymous sealed box — only the destination key opens it; the hub is blind by cryptography, not policy
Transparency chain	SHA-384 hash chain + Ed25519 checkpoints + witness co-signatures	append-only history no operator can rewrite without an independent witness noticing
Sovereign tier	hybrid: X25519+ML-KEM-1024 (FIPS 203), Ed25519+ML-DSA-87 (FIPS 204)	secure if either primitive holds — kills harvest-now-decrypt-later

The provider seam (FIPS path)

All governance hashing and signing routes through one abstraction with two interchangeable backends: a pure-Rust default and a FIPS 140-3-capable validated module. Both produce byte-identical output (standardized algorithms), so the backend swaps without breaking a single signature or hash contract — verified by the same known-answer and golden vectors passing under each. The build attests its own crypto posture; the compliance surface reads that attestation rather than asserting it.

Replay & freshness

Every sealed envelope carries a per-envelope nonce admitted through a memory-bounded freshness window; heartbeat sequences must strictly increase; per-call proof-of-possession challenges are action-bound and single-use. The freshness window is a tier-dialed, sealed knob — tighter as the dial turns toward Sovereign — visible in the accreditation record so an auditor reads the exact enforced value.

A subtlety we handle and most don't: the Merkle construction's odd-node promotion is the classic duplicate-leaf ambiguity class (CVE-2012-2459). It is not exploitable here — the leaf set is a fixed, role-domain-separated shape re-derived from the same inputs at verify — and the reasoning is documented at the call site, not assumed.

§5 — Identity & per-call authorization

A leaked credential is not an identity.

Each agent carries a SPIFFE-aligned, short-lived, scoped, attested non-human identity. Verifying the credential proves the issuer signed it — it does not prove the holder owns the key. So every privileged action requires a fresh, action-bound, replay-guarded proof of possession: the holder signs a per-call challenge with its private key, checked before the gate, the mandate, the defense, or execute. A captured credential blob alone is rejected. Identities rotate generate-swap-revoke; revocation is checked per call.

proof-of-possession — the leaked-blob attack, defeated

» genuine agent: in-scope call + valid fresh proof   → executes
» thief w/ the credential but the WRONG key          → IdentityRejected (before execute)
» replayed proof (same nonce)                        → IdentityRejected
» proof minted for a different action                → IdentityRejected

§6 — Audit, transparency & the covert channel

Metadata-only, tamper-evident, and witnessed by a third party.

Every action — and every refusal — is recorded as a metadata-only decision record: sequence, verdict, a boundary timestamp, the reason it was refused, and a SHA-384 hash of any raw result, never the result. The trace includes the actions that were blocked, which is exactly the explainability regulators and insurers ask for.

Append-only chain

Records hash into a SHA-384 chain whose head an independent party can recompute. Alter one entry and the head no longer reproduces.

Independent witness

The operator publishes signed checkpoints; an independent witness co-signs the same head. A customer verifies the witness against the witness's own published key — confirming the operator didn't cheat without trusting the operator.

Covert channel, measured

The projection is a side channel. Rather than wave it away, the channel's capacity is quantified (analytic upper bound + demonstrated lower bound) and throttled to a configured, logged ceiling. The honest residual is stated: structure cargo's symbol names are the payload the customer asked to see.

"Verify, don't trust" is a property here, not a tagline. The read-only Attestation Portal hands an auditor a signed artifact for every claim and the command to recompute it offline — it asserts nothing it cannot hand you the bytes for.

§7 — The hardening dial

One dial reconfigures the whole posture.

The same sealed artifact runs at three tiers; turning the dial changes where inference runs, how the control link degrades, key custody, the crypto suite, and egress — pinned stricter, never looser, and enforced in code (an operator may tighten a knob but never loosen it past the tier ceiling).

	Bolt-on	Regulated	Sovereign
Brain location	control-plane (inverted)	customer endpoint (BYO)	in-VPC local model
Control link	live dial-out	live + store-and-forward	offline signed artifacts
Key custody	software	HSM / KMS	hardware-bound
Crypto suite	classical CNSA 2.0	classical CNSA 2.0 (FIPS)	hybrid post-quantum
Egress (NetworkPolicy)	DNS + control plane :443	DNS + control plane :443	default-deny (air-gap)
Replay window	300 s	120 s	60 s

§8 — Runtime & deployment surface

Zero inbound ports. A scratch image. A read-only root.

Dial-out, not dial-in

The embassy dials out to the hub over an authenticated WebSocket (wss) — there are no inbound ports in the customer VPC. To register, the embassy cryptographically authenticates to the hub: an Ed25519/SHA-384 signature checked against a trusted-key allowlist, inside a freshness window, with a single-use replay nonce — so the hub admits only known embassies, and it is blind both ways. This removes the single hardest step in any on-prem install (a firewall exception) and shrinks the attack surface to nothing inbound. Commands ride the dial-out channel; results and metadata-only telemetry return the same way. The NetworkPolicy denies all ingress.

L6 image, STIG-mapped

FROM scratch, a single static-musl binary (RELRO+NX, no shell, no libc, no package manager), runs non-root with a read-only root filesystem, dropped capabilities, a seccomp profile, and no auto-mounted service-account token. Secrets are mounted at runtime, never baked into a layer. Each rule maps to a DISA STIG control, exported as a .ckl with an honest POA&M for the open items.

§9 — Evidence, generated not asserted

The whole assessor package, derived from the sealed delivery.

SSP + matrix

System Security Plan with the authorization boundary, a data-flow diagram, and a shared-responsibility matrix (operator / customer / cloud, per control family).

STIG .ckl + POA&M

DISA checklist from the real hardening, honest about its Opens (FIPS module, human MFA) rather than overclaiming.

OSCAL

Machine-readable component-definition and assessment-results (live per-control status) the buyer's GRC ingests directly.

AIBOM + SPRS

A signed AI Bill of Materials bound to the seal root, plus an honest in-scope SPRS contribution.

§10 — Honest limits

Where the guarantees stop. (Read this one twice.)

An engineer trusts a system that knows its own edges. These are stated in the source tree, not buried.

Containment ≠ correctness

Within its committed plan, an injected or buggy agent can still take a permitted-but-wrong action. The mandate shrinks the residual from the whole treaty to one frozen task; it does not zero it. Tight mandates + the active-defense sensors + the insurance layer cover the remainder.

Metadata, not anonymity

The hub is zero-knowledge on content, not on metadata — it sees who talks to whom, when, how much. Traffic analysis is possible.

IP-from-host only at the TEE

Data-flow inversion protects the customer's data from the operator at every tier. The operator's IP is protected from a malicious host by hardware only at Sovereign; below it, by compiled-binary obscurity + contract.

FIPS module pending a certificate

The crypto seam is built and proven byte-identical against a validated module — but module validation is a CMVP certificate from a lab, not a line of code. Reported as Open, never asserted as done.

§11 — The line this document stops at

What's deliberately not here.

This reference describes the architecture and the standardized cryptographic constructions to the edge of what's safe to publish. Beyond that edge — and excluded on purpose — are the internals of the post-quantum sovereign transport, the exact structured-extraction and projection heuristics that shape what a projection reveals, and the operator-side runbooks. Those are trade secrets, available under NDA to a serious evaluator. Their absence here is a design decision, not an omission.

Everything described above is real, tested, and deterministic. Everything below the line is real too — it just isn't on a public page. A pilot evaluator gets both.

The enforcement architecture, to the edge.

What it defends against — and, explicitly, what it does not.

In scope — defended by construction

Out of scope — stated plainly

The brain moves to the data; only a projection comes back.

Containment, then intent, then an immune response.

Containment

Intent — commit-then-act

Active defense — the immune response

§3.1 — Intent conformance, precisely: a warrant narrower than the contract.

§3.2 — Active defense, precisely: bait in the contract, baseline on the agent.

CNSA 2.0, domain-separated, and backend-swappable.

The provider seam (FIPS path)

Replay & freshness

A leaked credential is not an identity.

Metadata-only, tamper-evident, and witnessed by a third party.

Append-only chain

Independent witness

Covert channel, measured

One dial reconfigures the whole posture.

Zero inbound ports. A scratch image. A read-only root.

Dial-out, not dial-in

L6 image, STIG-mapped

The whole assessor package, derived from the sealed delivery.

SSP + matrix

STIG .ckl + POA&M

OSCAL

AIBOM + SPRS

Where the guarantees stop. (Read this one twice.)

Containment ≠ correctness

Metadata, not anonymity

IP-from-host only at the TEE

FIPS module pending a certificate

What's deliberately not here.

Engineered to the edge. Honest about the rest.