What Microsoft Got Right About Agent Governance — And Where It Stops Short

Pike Place Market at dusk, with a figure crossing First Ave; Public Market sign visible against Puget Sound
Pike Place Market, Seattle, this week. — flickr/roebot

A couple of weekends ago, I went through Microsoft/agent-governance-toolkit, fifty thousand lines of Python across seventeen packages, plus SDKs in TypeScript, .NET, Rust, and Go. It’s an entirely different and more effective approach to AI Agent Governance than other frameworks, which there are many.

Two conclusions: AGT is the most coherent piece of work anyone has shipped in this category. And the category itself is the news.

I reached out to Imran Siddique — Principal Group Engineering Manager at Microsoft, and the Founder/Creator of AGT — last week. Earlier this week, he joined us at an AI Confidential dinner (see note on this at the end) in Seattle. We talked through where software-layer enforcement ends and hardware enforcement begins, and agreed the policy engine needs a verifiable hardware layer underneath it.

Action layer, not content layer

Almost every “AI safety” tool on the market today filters tokens. LlamaFirewall classifies prompts. NeMo Guardrails constrains conversational flow. Guardrails AI validates output schemas. Llama Guard and IBM Granite Guardian classify content. Useful tools, all of them. The people building them are doing serious work. But they live at the same layer — the layer of words going into and out of a model.

AGT operates at a different layer. The unit of governance is the action. The tool call. The API hit. The file write. Each one gets intercepted and evaluated against declarative policy before execution. Sub-millisecond. Deterministic. Fail-closed.

The difference between those two paragraphs is the difference between “did the model say something offensive” and “did the agent just delete the production database.” Content guardrails cannot catch the second class of problem because the prompt that led there usually looks completely benign. A jailbreak detector sees no jailbreak. A toxicity classifier sees no toxicity. The agent simply executes a tool call that wipes a system, and nothing in the loop is watching the actions themselves.

Why Imran got this right

This is the Imran Siddique insight, and it deserves real credit. He runs Microsoft’s AI Native Team — at any given moment, eleven specialized agents are running concurrently against their production code repositories, making real decisions about real systems. He’s described it plainly: without governance, that’s eleven distinct attack surfaces, not eleven productivity multipliers. The team’s response wasn’t to train a smarter prompt classifier. It was to build what amounts to a syscall abstraction layer for AI agents. A kernel that intercepts every action before it executes and decides whether it’s allowed.

He calls the design philosophy “Scale by Subtraction.” Pull complexity out of the agents. Push it into the substrate. Agents become simpler. Governance becomes uniform. The whole system gets more reliable as it gets larger, which is the inverse of how most multi-agent systems actually behave. Anyone who has tried to ship more than three agents into production knows this is the right intuition.

Beyond the action-layer bet itself, AGT separates from the pack on three concrete things.

The first is determinism. LlamaFirewall and NeMo lean on machine-learning classifiers — BERT-based detectors, Colang flows, fine-tuned safety models. Probabilistic detection means measurable false-negative rates and reliable adversarial bypass. AGT’s policy engine is pure rule evaluation against a context dictionary. Same input, same decision, every time. Microsoft’s own benchmark cites prompt-only safety at a 26.67% red-team violation rate versus 0.00% for policy-layer enforcement. That second number is plausible because it’s measuring deterministic Python evaluation against YAML rules. There’s no model in the loop to fool.

The second is the SDK matrix. Almost every competitor in this space is Python-only. AGT ships first-class libraries in TypeScript, .NET, Rust, and Go alongside Python. That matters because the agent runtime in regulated enterprises increasingly isn’t Python. Semantic Kernel .NET shops, Go control planes, Rust-native services — they’ve been left behind by a Python-centric guardrail ecosystem. Microsoft is meeting them where they actually are.

The third is the bundle. Most tools in this space do one thing. Guardrails AI does output validation. Invariant Labs does prompt and MCP interception. Langfuse does observability. AGT bundles policy engine, zero-trust identity with DIDs and Ed25519 ephemeral credentials, MCP scanning, audit logging, sandboxing, and twelve framework adapters into a single toolkit. Closer to a Kubernetes for agents than to a single guardrail. The regulatory mapping ships with it — OWASP Agentic Top 10, EU AI Act, NIST AI RMF, Colorado AI Act, SOC 2. Built for procurement, not just engineering.

If you’re shipping agents into production today, the honest answer is: use AGT. There’s nothing better in the open-source landscape, and nothing close at this level of ambition.

Where it stops

The AGT README states it plainly:

“This toolkit provides application-level governance (Python middleware), not OS kernel-level isolation. The policy engine and agents run in the same process — the same trust boundary as every Python agent framework.”

That’s Microsoft being honest about the architecture. AGT does excellent work above the trust boundary. It has no way to establish a trust boundary. Search the codebase for any actual Hardware-backed Trusted Execution Environment platform — Intel TDX, AMD SEV-SNP, Intel SGX, AWS Nitro, NVIDIA confidential GPU, Azure Attestation. Zero hits. The attestation module defines a beautiful Pydantic schema with fields like ConfidentialLevel.TEE_HARDWARE and KeyOrigin.TEE_GENERATED and runtime_measurements. Nothing in the codebase produces or verifies any of them. The schema is waiting for a substrate.

The deterministic guarantee evaporates the moment a privileged process on the host decides to forge it. A motivated attacker — or a malicious cloud administrator, or a hypervisor compromise, or a kernel-level escape from a neighboring tenant — can patch the policy in process memory, replace the Ed25519 keys, forge audit entries before they’re sealed, or simply read the agent’s working memory including credentials, retrieved enterprise data, and model context.

Most failures here aren’t adversarial. They’re structural. An ops team’s memory dump pulls live inference data — no one acting in bad faith. APM telemetry exfiltrates full prompts under a retention contract no one in the AI org signed. An agent calls an external tool with raw customer data because no parameter-classification policy was ever bound to the workload. A long-running agent retains sensitive context across sessions and surfaces it to the next user. Agent A delegates to agent B and the policy bound to A doesn’t travel with the call. The OPAQUE AI Leak Surface catalogs forty-six of these vectors across compute, control, and application planes — boundaries that were configured but never enforced. The logs look clean. The system is still leaking. AGT can specify the policy that would close most of these. It cannot prove the policy was actually in force at the moment data flowed.

For an internal Microsoft AI Native Team running in trusted Microsoft infrastructure, this is fine. The threat model is a malicious agent, not a malicious operator. AGT solves that threat model brilliantly.

For a regulated bank, a sovereign cloud, or a pharma company moving molecular IP through an agent stack, the threat model is bigger.

Three layers. The third is the substrate.

Agent governance has three layers. What’s allowed — the policy. What runs — the execution. And whether the substrate enforcing the policy is itself verifiable — the attestation. Microsoft, AWS, Meta, NVIDIA, IBM, and the entire guardrails ecosystem are racing hard on layers one and two. The third layer is the one we’ve been building at OPAQUE since 2023, when we coined the term confidential AI.

This is the HTTPS pattern playing out again. For two decades the web ran sophisticated application-level authorization served over plaintext HTTP. The auth logic was sometimes brilliant. It also evaporated the moment a network operator decided to read or rewrite traffic. TLS made the substrate verifiable. Only then could authorization rely on the assumption that the channel underneath was honest. Agent governance is at exactly that point right now — sophisticated authorization, no verifiable substrate.

What OPAQUE supplies is the last mile. Hardware-backed Trusted Execution Environments across Intel TDX, AMD SEV-SNP, and NVIDIA confidential GPU. Attested key release. Verifiably sealed audit trails. Hardware-enforced protection that holds while data is in use, not just at rest and in transit. In a system where AGT’s PolicyEvaluator runs inside an OPAQUE-secured TEE, the policy itself is sealed and the evaluation is provable. The AttestationEvidence schema gets populated by a real Intel TDX, AMD SEV-SNP, or NVIDIA confidential GPU quote. The audit log is anchored in hardware-rooted Merkle commitments. The Ed25519 keys never leave the TEE. The cloud administrator can introspect nothing. The neighboring tenant can attack nothing. The decision Microsoft is currently delegating to the host is delegated to silicon instead.

Imran agrees this is where AGT stops and hardware enforcement has to take over. AGT is the consumer of that substrate, not a competitor to it. The two compose.

A regulated bank cannot deploy agents that probably honor policy. A sovereign cloud cannot run inference on infrastructure where the operator can read process memory. A pharma company cannot let its molecular IP travel through a stack the cloud administrator can introspect at will.

Imran got the category right. The substrate question is the question that comes next. That’s where regulated workloads live or die.


Recommended reading


A note on AI Confidential

AI Confidential is an invitation-only dinner series for AI builders working in the enterprise, regulated industries, or sovereign sectors. Attendees range from principal engineers and architects to CTOs and CIOs at companies like McKinsey, Microsoft, NVIDIA, Intel, Cisco, SAP, GE HealthCare, Walmart, Ford, PayPal, Visa, Oracle, JPMC, Morgan Stanley, Equifax, Block, Stanford, Google, and UC Berkeley. Format is small — ten to fifteen people. Chatham House rules. The focus is AI patterns and anti-patterns. No pitches. No PowerPoints. No talking heads. Just real practitioners connecting on what’s actually working and what isn’t. Always educational, authentic, and fun. DM me to request an invite; we host these dinners every couple of months.

Leave a comment