97% accuracy means 3% of attacks land. Forever.
Prompt-injection filtering is an arms race: the space of adversarial inputs is unbounded, the space of trained defenses is finite. Every model trained to detect injection becomes a model an attacker can probe. As of 2026, there is no reliable filtering defense — which is exactly why the durable controls are privilege containment and provenance, not detection. Memory poisoning makes it worse: one planted instruction can execute weeks later, against an unrelated query.
"The only reliable defence is structural — cut one of the three legs as an architectural decision, not a configuration setting." Meta formalized it as the Rule of Two: never all three.
Two legs cut. Four hard boundaries. By construction.
Leg cut: private data → model
Data-flow inversion. The model never sees raw data — only redacted, structure-only projections. An injection can't leak what the inference never received.
Leg cut: external communication
The membrane. The only thing that leaves the VPC is a governed projection over a sealed, hub-blind channel. There is no freely-invented exfiltration path.
| Hard boundary (the field's requirement) | Legation's structural mechanism |
|---|---|
| Identity & authorization | Sealed-bag admission (Ed25519) + the treaty bound to the intersection of user and agent permissions |
| Data-flow enforcement | Data-flow inversion — sensitive data cannot reach an external channel |
| Isolation primitive | The membrane physically constrains the agent; the routing hub is zero-knowledge |
| Human authorization gate | Seneschal escalate-to-human + per-task Mandate (commit-then-act) + dual Recall (the customer severs unilaterally; the operator cannot override) |
This isn't a feature we added. It's a claim we can substantiate — because the path doesn't exist. An injected instruction has no route to exfiltrate, independent of model behavior.
Cutting the path is structural; a second, active layer watches behavior. Honeytokens seeded in the workspace and a behavioral rate baseline trigger an immune-response self-sever — the embassy cuts its own outbound link before a drifting or hijacked agent can act on what it touched. Defense in depth, not a single control.
Proof an auditor, a regulator, and an insurer can each ingest.
Machine-readable evidence
Signed, machine-readable evidence — SSP, POA&M, and a CycloneDX SBOM, aligned to NIST OSCAL — that the buyer's GRC and the regulator's systems consume directly. Not a PDF.
Verifiable AIBOM
A signed AI Bill of Materials per delivery — every component bound into the sealed Merkle root. Verifiable, not self-reported.
Insurance-ready
Maps 1:1 to the underwriter checklist; the attestation proves who controlled the guardrail — clean liability allocation. Provable controls = insurable.
Decision replay
Step-by-step replay of every agent decision — including the ones it blocked. Exactly the explainability regulators and insurers now require.
Five different forces converged on what Legation already is.
U.S. Dept. of War
Agentic-AI guidance (Apr 2026): just-in-time credentials, fresh cryptographic proofs before privileged calls, signed commands. Your gate + treaty + seal.
EU AI Act
Override must be "practically accessible — not buried in an admin interface." That's the customer Recall button.
Cyber insurers
From best-effort guardrails to provable controls; won't cover untraceable systems. You emit cryptographic proof.
NIST OSCAL / COSAiS
Machine-readable continuous compliance + agentic 800-53 overlays landing in 2026. You emit signed, machine-readable evidence today.
Zero Trust 2026
Boundary moved to "ID and silicon" — TEE confidential computing. Your hardware-rooted embassy at the Sovereign tier.
The market
It didn't predict you. It walked toward you. The world-class move is to ship what you already are, in the forms five markets now demand.