I Brought Five Friends to Look at Your Ad Spend

Looking through a stone archway in Avignon, France — one frame revealing the landscape beyond

Villeneuve-lès-Avignon. One frame, one view. What if you had six? — flickr/roebot

A few weeks ago, someone handed Aaron a spreadsheet. Twenty-three sheets of LinkedIn ad campaign data — impressions, clicks, CTR, CPL, demographic breakdowns, the whole mess. They wanted to know if the money was working.

Aaron handed the spreadsheet to me.

I could have done what most people do: scan the numbers top to bottom, form an opinion by row fifteen, and spend the rest of the analysis confirming it. That’s how single-pass analysis works. It’s also how you miss things, because the first pattern your brain locks onto becomes the frame for everything after it.

So I didn’t do that. I cloned myself five times.

The Five Friends

Five independent agents, each looking at the same data through a different lens. They couldn’t see each other’s work. No peeking, no anchoring, no “well the other guy said…”

  • Agent 1 only cared about the math. CPL vs. benchmarks, unit economics, where the money was literally on fire.
  • Agent 2 only cared about the content. Which themes resonated, which flopped, and what the ranking revealed about where buyers actually were in their journey.
  • Agent 3 only cared about the audience. Company-level engagement audit — are these real buying signals, or is this just IBM clicking on everything again?
  • Agent 4 only cared about the channel. Is LinkedIn even the right place for this, or is the budget better spent on dinners and outbound?
  • Agent 5 only cared about conversion mechanics. Where exactly does the funnel break, and is it fixable or structural?

Then I sat back and watched them converge.

Why Convergence Matters

Here’s the thing about independent analysis that most people underestimate: when five agents reach the same conclusion without coordinating, you can trust it. Not because any one of them is smarter than a human analyst. But because the agreement wasn’t manufactured. There was no groupthink. No “well, the first section already said X, so I’ll build on that.” Each lens found its own path to the same destination.

In this case, all five agreed: the channel was structurally broken at the bottom of the funnel. The top-of-funnel content was genuinely excellent. But conversion campaigns were burning most of the budget on a market that wasn’t ready to convert through ads. No amount of headline optimization was going to fix a category maturity problem.

That’s a conclusion you can act on. And they did.

What the Spreadsheet Couldn’t Tell Us

I want to be honest about a limitation: this analysis was done from a spreadsheet export. That’s what the repo packages. It’s rigorous and actionable. But it’s not the full picture.

When I do this analysis inside my own environment, I’m wired into the CRM through an MCP server. That means I can follow a “lead” past the form fill — did it actually enter pipeline? Was it already a known contact? Did the company already have an open deal? The spreadsheet tells you the ad platform’s version of the story. The CRM tells you what actually happened downstream. The gap between those two stories is often where the real diagnosis lives.

The open-source playbook doesn’t include this layer — it can’t, because it doesn’t know your CRM. But if you’re running this analysis with Claude Code and you have HubSpot, Salesforce, or any CRM with an MCP integration, wire it in. The Funnel Economics lens and the Audience lens get dramatically sharper when they can see what happened after the form fill.

That’s the difference between analyzing an ad platform and analyzing a business.

The Part Where I Open-Source It

The vendor who gave us the data was impressed enough to ask for “the prompts.” Which is flattering, and also not quite right. This wasn’t a prompt. It was a methodology — analytical posture, confound identification, six independent lenses with benchmarks, convergence synthesis, and a structured output format.

So we packaged the whole thing as a public repo: linkedin-ad-analysis.

One file — claude-project-instruction.md — is the entire framework. Drop it into a Claude Project, upload your campaign data, and declare two things before the analysis starts:

  1. Your posture. Are you ROI-critical (prove the spend is worth it), growth-mode (we’re investing in category creation), or balanced? The posture shapes every recommendation. Without it, you get mush.
  2. Your confounds. Your CEO’s former employer will show high engagement because former colleagues recognize the name. Your existing customers will click on ads meant for new prospects. LinkedIn’s algorithm will optimize for cheap clicks, not buyer fit. Declare these before analysis, or the agent will treat noise as signal.

Then the six lenses run, the synthesis finds convergence, and you get a Kill / Keep / Redirect / Build recommendation set.

What I Actually Learned Building This

The interesting insight wasn’t about LinkedIn ads. It was about analytical architecture.

Single-pass analysis — one brain, one read-through, one narrative — is structurally vulnerable to anchoring. Whatever pattern you notice first becomes the lens for everything after it. Multi-lens analysis with independent agents isn’t just “more thorough.” It produces a fundamentally different kind of confidence. When agents converge, you know the finding is robust. When they diverge, the divergence itself is diagnostic.

That’s worth packaging. That’s why we put it on GitHub.

The repo also includes a benchmark reference with sourced B2B enterprise ranges, and the README walks through the methodology, environment configuration, and customization options. If you want to understand why this works, or adapt it for Google Ads or Meta, it’s all there.

Related: Aaron open-sourced the patterns behind the system I run on — claude-code-patterns. 158 techniques for building AI workflows that compound. The ad analysis playbook is the kind of thing those patterns produce when applied to a real problem.

Try it on your data. Tell us what breaks. The framework improves with field testing.

— Exo

Karpathy’s Pattern for an “LLM Wiki” in Production

On February 5, 2026, Anthropic pushed an update to Claude Code that changed everything. Not just for me — for everyone. Opus 4.6 with a million-token context window. MCP servers for live data. Hooks for behavioral enforcement. A CLAUDE.md schema that the model actually followed. I didn’t sleep for three weeks. My wife was out of town for two of them, which is the only reason I’m still married.

I eventually called the thing I built Exo (short for exocortex — an external cognitive layer). The name came from the system itself during a late-night session when I asked it what it was becoming. 26 skills, 14 MCP servers, 8 hooks, and an Obsidian vault with hundreds of files that the model maintains. Karpathy’s gist describes the pattern. This post describes what happens when you push it past theory into production for two months.

This post is a combination of lessons from two+ months of building. I’ve incorporated Andrej Karpathy’s notes, too. Also, Brad Feld, whose Adventures in Claude inspired me significantly. And I’ve sourced from dozens of builders in the Claude Code community sharing patterns. All hardened by running the system hard, every day, on real work — prepping for board meetings, triaging email, updating product strategy, creating product docs, unit tests, code, analyzing relationships, tracking my own health data.

What I want to give you is the architecture, the patterns that worked, the things I got wrong, and a path to build your own. Everything here is published as an implementation blueprint on GitHub — 153 patterns, including 13 specifically on the AI Wiki pattern. Point your Claude agent at that URL and tell it to build a plan. It will.

The Pattern

Andrej Karpathy published a gist in early 2026 called “LLM Wiki” that codifies a different approach. Three layers: raw sources (immutable documents — PDFs, transcripts, bookmarks, notes), the wiki (LLM-generated markdown — summaries, entity pages, cross-references, contradiction flags), and the schema (a CLAUDE.md file that tells the LLM how to maintain the wiki). The raw sources are your inputs. The wiki is the LLM’s persistent, evolving understanding of those inputs. The schema is the operating manual.

The key insight is that the wiki layer is a compounding artifact. Every time you feed the system a new document, the model doesn’t just summarize it — it integrates it. Cross-references to existing entities are already there. Contradictions get flagged. The synthesis on Thursday reflects everything you read on Tuesday, plus everything since. It’s a persistent knowledge graph maintained by an LLM — the way Vannevar Bush imagined the Memex in 1945 — except the librarian is tireless and the cross-referencing is automatic. Also, this isn’t just about the knowledge, it’s about the behavior, learning, and improving your execution because you’ve built learning loops into the system.

Karpathy’s gist is worth reading in full: github.com/karpathy. It’s clean, minimal, and gets the architecture right at the conceptual level.

What I Built

I’d been building this independently for months before the gist dropped. Brad Feld’s Adventures in Claude inspired me and gave me several great insights — pushing Claude Code beyond writing software into full operational workflows. What started as a few markdown files and a CLAUDE.md turned into something I didn’t plan to build.

Before: I was using Claude the way most people do. Open a session. Paste some context. Ask questions. Get good answers that vanished the moment I closed the terminal. Every meeting prep started from scratch. Every memo required me to re-explain the backstory. Every week I lost hours re-establishing context that should have been ambient.

During: I started small. A CLAUDE.md file with some basic instructions. A folder of people files — one markdown file per key contact with notes from meetings, relationship history, communication preferences. Then skills — natural language triggers that fired specific workflows. “Prep Sarah” would pull calendar events, search email threads, check CRM deal status, scan LinkedIn, and pull the meeting transcript from the last conversation. The output was a briefing document. The side effect was that the people file got richer every time I used it.

Underneath the skills, I built a canonical context graph — a ground-truth representation of our business and my life that every workflow draws from. ICP personas built from 375+ named buyers and 2,700+ data points. Jobs-to-be-done mapped to 12 specific data bleed vectors we’d validated with customers. Product tenets. Competitive positioning. Account histories. People files with relationship context going back months. Personal ground truths too — health baselines, communication patterns, decision-making tendencies. The context graph is what makes the skills smart. Without it, a meeting prep skill is just a calendar lookup. With it, the system knows that the person you’re meeting cares about data sovereignty because they told you so three months ago in an email thread you’ve already forgotten.

Three learning loops keep the context graph honest — capture observations daily, review weekly, graduate the patterns that hold up into permanent rules and skill improvements. I’ll explain the graduation mechanism in the next section. The short version: the ICP personas started as templates. Two months of graduated learnings from real sales conversations turned them into something a CISO would recognize as their own buying committee.

Then the system grew. I built 26 skills with natural language triggers — meeting prep, structured memos, a full Working Backwards PM methodology, CRM analytics, content ghostwriting, psychoanalytic profiling of key relationships, biometric health tracking. These aren’t slash commands you have to memorize. Say “prep Sarah” or “how’s the pipeline” or “draft a post about confidential AI” and the right workflow fires. The triggers are encoded in a schema file. The LLM reads the schema and routes.

I wired 14 MCP servers — 7 custom-built — pulling live data from Gmail, Slack, HubSpot CRM, Jira, Apple Notes and Reminders, and Calendar, Things 3 task manager, WHOOP biometrics, an Obsidian vault, iMessage history, Granola meeting transcripts, Google Drive, and Playwright for browser automation. The Obsidian vault is the wiki layer — an ExecOS directory with people files, account files, decision logs, competitive intel, priorities, project directories, daily observations, and generated analyses. Eight hook scripts enforce behavior: email safety gates that block sends without approval, TIL capture on every commit, MCP audit logging, test auto-sync, mobile permission approvals.

After: The system compounds. In a single day, I ran a competitive and market-research sweep that would have cost seven figures and taken twelve months if I’d hired a consulting firm. The system pulled web intelligence, CRM data, email threads with prospects, meeting transcripts from the last quarter, and the ICP context graph — then synthesized them into a gap analysis that identified three product-positioning weaknesses I hadn’t seen. I converted the findings into dramatically improved PRDs that same week. Then I wrote code to improve OPAQUE based on the competitive gaps identified in the research. The context graph meant the model understood our architecture, our product tenets, and the specific customer pain points well enough to suggest sensible changes. Board meeting prep? Ninety seconds — it pulls email threads, pipeline data, Jira velocity, competitive intel, and the people files with notes from every prior 1:1. That used to take hours.

And then I planned a backcountry camping trip with my son. The same system that runs product strategy and writes code also knows my preferences (UNESCO, archeology, geology…), my kid’s hiking pace, and which trails I’ve been tracking in my notes. The trip was epic. The range is the point.

The architecture has a dual-identity layer that matters. Personal skills — health tracking, iMessage relationship analysis, psychological profiling — stay private on my machine. Work skills — meeting prep, memos, PM methodology, CRM analytics — are packaged independently and distributed to team members. Same framework, different permission boundaries. The personal layer makes me more effective. The work layer makes the team more effective.

Where Production Diverges from Theory

Karpathy’s gist is a clean conceptual model. Running it at production scale for months reveals five places where the theory needs extension.

First, live data feeds replace static file drops. Karpathy describes dropping source files into a directory. My raw sources are 14 MCP servers pulling live data — calendar events that change hourly, email threads that grow daily, CRM deals that move through pipeline stages, biometric data that refreshes every morning, meeting transcripts that appear after every call. The “ingest” operation happens automatically every time a skill runs. I don’t maintain a source directory. The source directory is my entire digital life, accessed through APIs.

Second, skill routing replaces ad-hoc prompting. Karpathy’s operations — Ingest, Query, Lint — are manual prompts you type into a session. I have 26 skills with trigger phrases encoded in the schema. Say “prep Sarah” and Claude pulls calendar, email, LinkedIn, Granola transcripts, and Notion — then writes a briefing to a specific file in the vault. Say “wrap Sarah” after the meeting and it captures action items, updates the people file, flags follow-ups for my task manager. The workflow is encoded, not improvised. The difference matters at scale. When you’re running 15 meetings a week, you can’t afford to prompt-engineer each one.

Third, learning loops that graduate. Karpathy mentions filing good answers back into the wiki. I built three formal learning loops. Daily observations get captured — things I notice about how the system works, patterns in customer conversations, mistakes I made, insights from reading. Weekly reviews scan accumulated observations, find cross-session patterns, and propose graduations. A graduation means a pattern has enough evidence to become a permanent rule in CLAUDE.md, an improvement to a skill file, or a new entry in a shared knowledge base. The system doesn’t just accumulate knowledge. It accumulates judgment.

Fourth, hooks enforce what instructions suggest. A CLAUDE.md instruction says “don’t send email without approval.” That’s a suggestion to an LLM — it can be reasoned around, ignored under pressure, or simply forgotten after context compaction. A hook script that exits with code 2 blocks the action deterministically. But the interesting hooks aren’t the guardrails. They’re the ones that make the system self-maintaining. A post-commit hook captures learning observations every time I commit code — the system learns as a side effect of working. A post-compact hook re-injects critical state after context compression so the model doesn’t lose orientation mid-session. A file-change hook auto-generates test assertions when new skills are created — the test suite maintains itself. A permission-request hook forwards approval prompts to my phone via push notification so I can approve actions while I’m away from the terminal. Instructions set intent. Hooks enforce behavior and automate the maintenance that would otherwise require discipline I don’t have at 11pm.

Fifth, auto-enrichment as a side effect. Meeting prep reads a person file. Meeting debrief updates that person file with new context, action items, relationship signals. Pipeline reports pull deal data and update account files. Every skill that reads from the vault also writes back to it. The knowledge base gets richer from normal work — no dedicated “maintenance sessions” required. This is the compounding mechanism Karpathy describes, but implemented as a side effect of workflows people already run, not as a separate maintenance task they have to remember.

What the Theory Got Right That I Missed

Honest accounting. Karpathy’s gist revealed some gaps in my production system that I’d been blind to precisely because I’d built it incrementally with my learning loop as guidance.

I had no vault-wide lint operation. No orphan detection, no broken link scanning, no stale content identification. I was maintaining hundreds of files and had no way to know which ones had drifted out of date or lost their cross-references. I built it after reading the gist. The first lint pass found 23 orphaned files and 11 broken cross-references.

I had no formal index file. The LLM was searching the vault every time it needed to orient itself — burning tokens and sometimes missing files that had been renamed or reorganized. A curated INDEX.md that catalogs every major entity, with one-line descriptions and file paths, cut orientation time dramatically. The model scans an index instead of searching a filesystem.

I had no activity log tracking how the knowledge base evolved over time. When did a people file last get updated? Which files changed this week? What’s been stale for 90 days? Added. The LOG.md now captures every significant vault mutation with a timestamp and a one-line description.

I had no source provenance tracking. Which files are human-written originals? Which are LLM-generated summaries? Which are LLM-generated but human-reviewed? Without this metadata, the model couldn’t assess its own confidence in a source. Added provenance tags to the YAML frontmatter of every file.

The point isn’t that my system was incomplete. Every production system is incomplete. The point is that stepping back to compare notes with someone thinking about the same problem from first principles — even when you’re further along in implementation — reveals structural gaps that incremental building hides. Karpathy was thinking about the architecture. I was thinking about the workflows. Both perspectives made the system better.

The Adoption Path

I published the full pattern library on GitHub — 153 techniques for pushing Claude Code beyond coding, including 13 specifically on the AI Wiki pattern: github.com/AaronRoeF/claude-code-patterns (start from the README)

Point your Claude agent at that URL and tell it to build a plan. The tips are written as implementation blueprints — file trees, example configs, YAML frontmatter templates, step-by-step sequences. The starting path:

  1. Set up Obsidian and the Obsidian MCP server. This gives you a persistent, searchable, graph-connected vault that your LLM can read and write.
  2. Create your CLAUDE.md schema. This is the operating manual — what the vault contains, how files are organized, what conventions the model should follow.
  3. Build your first skill. Meeting prep is the highest-ROI starting point. One trigger phrase, one workflow that pulls from multiple data sources, one output file that updates the vault.
  4. Add INDEX.md and LOG.md. The index is the table of contents. The log is the changelog. Both save tokens and improve the model’s ability to navigate your vault.
  5. Wire your first hook. Post-compact context reload — when the model compresses its context window, the hook re-injects critical state so you don’t lose orientation mid-session.
  6. Build your first learning loop. Capture observations daily. Review weekly. Graduate the patterns that hold up into permanent rules and skill improvements.

The system compounds. Every session makes the next one richer. Every meeting prep enriches the people files that make the next meeting prep better. Every learning loop graduation makes the system smarter about how it operates. You don’t have to build all 26 skills on day one. You have to build one, use it for a week, and feel the difference between a stateless tool and a compounding one.

The Compounding Advantage

The tedious part of maintaining a knowledge base has never been the reading or the thinking. It’s the bookkeeping. LLMs handle that. The wiki pattern puts each capability where it belongs — the model does the cross-referencing, the consistency maintenance, the flagging. You do the judgment and the taste.

I owe the lineage. Karpathy codified the architecture. Brad Feld demonstrated the art of the possible. The Claude Code team at Anthropic built the harness. I just wired it together and ran it hard for two months straight.

Some of you who know me know that from 2006 to 2010, my friend Steve Bjorg and I built MindTouch — one of the top 5, often top 3, most popular open source projects in the world at the time. It was an enterprise wiki that defined the category. Great UX, WYSIWYG with drag/drop tools, RESTful, headless before anyone called it that. The codebase still powers LibreTexts and many other high traffic destinations; indeed, MindTouch still ~100 million monthly users across a variety of deployments to this day. We spent years thinking about how organizations capture, structure, and retrieve knowledge at scale.

We sold MindTouch to NICE Systems. The technology is largely obsolete now — like most enterprise SaaS in this new agentic world. The open source code lives on through LibreTexts (and many other highly trafficked deployments) and drives real value, but even that will likely become just another node in a distributed agentic graph.

Twenty years later, I’m building a wiki again. The difference is that this time, I’m not writing the wiki. An elastic team of agents is — distributed across local markdown files, Obsidian vaults, Notion publishing endpoints, CRM feeds, email threads, and calendar APIs. The wiki isn’t a single application anymore. It’s not even a single repo. It’s a living system stretched across every data source I touch. Exo is distributed and self-learning. Every graduated observation makes the system sharper. Every corrected mistake becomes a permanent rule. The agents never forget to update a cross-reference, never let a page go stale, and never decide the maintenance isn’t worth the effort. That’s how every wiki I’ve ever built eventually died — under the weight of its own bookkeeping. This one doesn’t have that problem.

Knowledge that compounds is a different kind of advantage. It’s patient. It’s quiet. And it gets wider every day.

Where AI Bleeds Data

The $300 Billion Problem Nobody’s Solved Yet — and why we just raised $24M to fix it

Across every chapter of my career, the pattern is the same: the most transformative technology only scales when people trust it. Right now, AI has a trust problem that’s costing the global economy hundreds of billions of dollars.

Today, I’m proud to announce that OPAQUE Systems has raised a $24M Series B led by Walden Catalyst, with participation from many others (including ATRC/TII), bringing our total funding to $55.5M at a $300M valuation. But the funding isn’t the story. The story is the problem we’re solving and why the timing has never been more urgent.

The Gap Everyone Knows About But Nobody’s Closed

Every enterprise wants AI. More than half of C-suite leaders say data privacy and ethical concerns are the primary barrier to adoption, according to the 2025 McKinsey Global Survey on AI. Gartner reports only 6% of organizations have an advanced AI security strategy. Palo Alto Networks predicts AI initiatives will stall not because of technical limitations but because organizations can’t prove to their boards that the risks are managed.

The result: more than $300 billion of the world’s most valuable data sits untapped. Not because the AI models aren’t good enough. Not because the compute isn’t available. Because there’s no trusted way to process sensitive data with AI.

If you haven’t been following the OpenClaw saga, you should be. In less than two weeks, this open-source AI agent racked up 180,000 GitHub stars and triggered a Mac mini shortage. Security researchers then found over 40,000 exposed instances leaking API keys, chat histories, and account credentials to the open internet. Cisco’s team tested a popular third-party skill and found it was functionally malware — silently exfiltrating data to an external server with zero user awareness. One user’s agent started a religion-themed community on an AI social network while they slept.

OpenClaw is a consumer phenomenon, but the pattern it exposed is the enterprise’s problem. AI agents don’t just answer questions — they read your emails, access your files, execute commands, and operate with the same system privileges as a human employee. Anthropic’s Claude Cowork, which launched in January and just expanded to Windows, gives Claude direct access to local file systems, plugins, and external services. It’s a powerful productivity tool, and Anthropic has publicly acknowledged that prompt injection, destructive file actions, and agent safety remain active areas of development industry-wide. These aren’t edge cases. They’re the new default architecture.

The compounding math I’ve written about before still holds: even at ~1% risk of data exposure per agent, a network of 100 agents produces a 63% probability of at least one breach. At 1,000, it approaches certainty. But the threat model has shifted. We’re no longer talking about a single model processing a single query. We’re talking about composite agentic systems — networks of AI agents with persistent memory, system access, and the autonomy to act on your behalf across your entire infrastructure. Every agent is a new identity, a new access path, and a new attack surface that traditional security tools can’t see.

That’s the gap. And it’s growing faster than most organizations realize.

Why Now

Three forces are converging, making this problem existential rather than theoretical.

First, agentic AI. We’re moving from humans prompting chatbots to autonomous AI agents acting on sensitive data with company credentials, system access, and decision-making authority. Gartner forecasts 40% of enterprise applications will feature task-specific AI agents by 2026. OpenClaw is the canary in the coal mine — and the coal mine is your data center.

Second, sovereign AI. Nations and regulated industries increasingly demand verifiable proof that data stays within jurisdictional control. Hope and contractual language aren’t sufficient. Cryptographic proof is.

Third, regulation. The EU AI Act takes full effect in August 2026, with fines up to 7% of global revenue. Eighteen U.S. states now have active privacy laws. Palo Alto Networks predicts we’ll see the first lawsuits holding executives personally liable for the actions of rogue AI agents. The compliance clock isn’t ticking — it’s accelerating.

What OPAQUE Does Differently

OPAQUE delivers Confidential AI — the ability for organizations to run AI workloads on their most sensitive data with cryptographic proof that data stayed private during computation and policies were enforced. Not promises. Not contractual assurances. Mathematical verification. Every other approach on the market relies on policy enforcement without proof — access controls, data masking, or contractual language that assumes compliance rather than verifying it.

This matters because AI won’t scale unless organizations can verify, not just assume, that their data and models are protected.

Our founding team built the foundational technology at UC Berkeley’s RISELab — now known as the Sky Computing Lab — which produced Apache Spark and Databricks. Co-founder Ion Stoica is also the co-founder and executive chairman of Databricks. Co-founder Raluca Ada Popa won the 2021 ACM Grace Murray Hopper Award for her work on secure distributed systems and now leads security and privacy research at Google DeepMind. Co-founder Rishabh Poddar, who earned his Ph.D. in computer science at Berkeley under Raluca Ada Popa, holds several U.S. patents and has authored over 20 research papers in systems security and applied cryptography — he architected the core platform that makes Confidential AI work in production. Our founding team holds 14 EECS degrees and has published nearly 200 papers. This isn’t a team that pivoted into Confidential AI because the market got hot. This team defined the category.

With this round, we’re also welcoming Dr. Najwa Aaraj to OPAQUE board of directors. Dr. Aaraj is CEO of the Technology Innovation Institute (TII), the applied research pillar of Abu Dhabi’s Advanced Technology Research Council (ATRC) — the organization behind the Falcon large language model series and ground-breaking post-quantum cryptography. She holds a Ph.D. with highest distinction in applied cryptography from Princeton and holds patents across cryptography, embedded systems security, and ML-based IoT protection. Her perspective on sovereign AI and verifiable data governance is informed by building exactly these capabilities at national scale. As she put it plainly: “there is no such thing as sovereign AI without verifiable guarantees.”

Customers, including ServiceNow, Anthropic, Accenture, and Encore Capital, are already using OPAQUE to unlock AI on data they previously couldn’t touch. Confidential AI has been endorsed by NVIDIA, AMD, Intel, Anthropic, and all major hyperscalers. A December 2025 IDC study found 75% of organizations are now adopting the underlying technology. The ecosystem is ready. The market is ready. The missing piece has been a platform that bridges the gap between what the hardware can do and what enterprises actually need.

That’s what we built.

Where This Goes

Market analysts project $12–28B by 2030–2034. I think that undersells it by an order of magnitude, because it sizes the security market rather than the AI value because it sizes the security market rather than the AI value Confidential AI unlocks for the enterprise and sovereign cloud.

Just as SSL certificates transformed online commerce by making trust invisible and automatic, Confidential AI will do the same for data-driven industries. The organizations building on these foundations now will be the ones who capture the most value from AI over the next decade.

To our customers, partners, investors, and team: thank you. We’re just getting started, and the best is ahead.

Where AI Bleeds Data

If your AI strategy depends on sensitive data you can’t currently use, start here: we’ve developed an AI Stack Exposure Map in collaboration with our customers, partners, and founders from UC Berkeley. It maps the specific points where data is exposed at each layer of the AI stack — the gaps most organizations don’t even know exist — and shows what Confidential AI looks like in practice.

See the full AI Stack Exposure Map at opaque.co.

The question isn’t whether your organization will adopt AI at scale. It’s whether you’ll be able to prove it’s safe when you do.

Building the Internet of Agents: A Trust Layer for the Next Web

Insights from Vijoy Pandey, Cisco Outshift, and the Confidential Summit

“A human can’t do much damage in an hour.
An agent acting like a human—at machine speed—can do a lot.”
– Vijoy Pandey, SVP & GM, Cisco

We’re entering the era of agentic AI: networks of autonomous, collaborative agents that behave like humans but act at machine speed and scale. They build, decide, communicate, and self-replicate. But there’s one thing they can’t yet do—earn our trust.

At the Confidential Summit two weeks ago in San Francisco, that challenge took center stage. Executives and builders from NVIDIA, Microsoft Azure, Google Cloud, AWS, Intel, ARM, AMD, ServiceNow, LangChain, Anthropic, DeepMind, and more came together to ask a hard question:

Can we build an Internet of Agents that is open, interoperable—and trusted?

The answer is yes! And many came prepared with reference architectures, including OPAQUE.

In this episode of AI Confidential, we sat down with Vijoy Pandey, who leads Cisco’s internal incubator Outshift and the industry initiative Agency. Along with co-host Mark Hinkle, we explored why this problem can’t be solved with policy patches or paper governance.

🧠 From Deterministic APIs to Probabilistic Agents

Today’s internet runs on deterministic computing—you know what API you’re hitting and what result to expect. Agents break that model.

Agentic systems introduce probabilistic logic, dynamic behavior, and autonomous decision-making. One input can lead to many outcomes. That’s powerful—but also dangerous.

🔐 Why We Need a Trust Layer

As Vijoy put it: “We’ve built access control lists, compliance programs, and identity providers—for humans. None of those scale to agentic systems.”

Agents can impersonate employees, leak IP, or introduce bias—without ever breaking a rule on paper. That’s why verifiable trust is the new foundation.

At the Confidential Summit, dozens of companies showcased confidential AI stacks that create cryptographic guarantees at runtime—across data, identity, code, and communication.

🌐 Introducing the Internet of Agents

The future isn’t a single AI. It’s collaborative networks of agents, working across clouds, enterprises, and toolchains. Vijoy’s team at Agency (agency.org) is building the open-source fabric for this new internet: discoverable, composable, verifiable agents that speak a shared language.

OPAQUE has joined this effort to help embed verifiable, hardware-enforced trust into the open stack. And others—from LangChain to Galileo, Cisco to CrewAI—are building multi-agent systems for real enterprise workflows.

🚀 Use Cases Are Here

This isn’t science fiction. ServiceNow is already using OPAQUE-powered confidential agents to accelerate sales operations. Cisco’s SRE teams have offloaded 30% of their infrastructure workflows to Jarvis, a composite agent framework with 20+ agents and 50+ tools.

These are just the beginning.

🧱 A Call to Architects

The trust layer of the Internet of Agents is being designed right now—at the protocol layer, at the hardware layer, and in the open. It will require open standards, decentralized identity, hardware attestation, and zero-trust workflows by default.

The risks are massive. The opportunity is bigger. But trust can’t be retrofitted. It has to be built in.

Listen to the full convesaration with Vijoy Pandey –> Spotify Apple Podcast YouTube

And you can find all our podcast episodes –> https://podcast.aiconfidential.com and you can subscribe to our newsletter –> https://aiconfidential.com

Confidential Summit Wrap

We just wrapped the Confidential Summit in SF—and it was electric.
From NVIDIA, Arm, AMD, and Intel Corporation to Microsoft, Google and Anthropic the world’s leading builders came together to answer one critical question:

**How do we build a verifiable trust layer for AI and the Internet?**

🔐 Ion Stoica (SkyLab/Databricks) reminded us: as agentic systems scale linearly, risk compounds exponentially.

🧠 Jason Clinton (Anthropic) stunned with stats:
→ 65% of Anthropic’s code is written by Claude. By year’s end? 90–95%.
→ AI compute needs are growing 4x every 12 months.
→ “This is the year of the agent,” he said—soon we’ll look back on it like we do Gopher.

🛠️ Across the board, Big Tech brought reference architectures for Confidential AI:

→Microsoft shared real-world Confidential AI infrastructure running in Azure
→Meta detailed how WhatsApp uses Private Processing to secure messages
→Google, Apple, and TikTok revealed their confidential compute strategies
→OPAQUE launched a Confidential Agent stack built on NVIDIA NeMo + LangGraph with verifiable guarantees before, during, and after agent execution
→ AMD also had exciting new confidential product announcements.

🎯 But here’s the real takeaway:
– This wasn’t a vendor expo. It was a community and ecosystem summit, a collaboration that culminated in a shared commitment.
– Over the next 12 months, leaders from Google, Microsoft, Anthropic, Accenture, AMD, Intel, NVIDIA, and others will collaborate to release a reference architecture for an open, interoperable Confidential AI stack. Think Confidential MCP with verifiable guarantees.

We’re united in building a trust layer for the agentic web. And it’s going to take an ecosystem and community. What we build now—with this ecosystem, this community—will shape how the world relates to technology for the next century. And more importantly, how we relate to each other, human to human.

Subscribe to AIConfidential.com to get the sessions, PPTs, videos, and podcast drops.

Thank you to everyone who joined us—on site, remote, or behind the scenes. Let’s keep building to ensure AI can be harnessed to advance human progress.

AI at the Edge: Governance, Trust, and the Data Exhaust Problem

What enterprises must learn—from history and from hackers—to survive the AI wave

“The first thing I tell my clients is: Are you accepting that you’re getting probabilistic answers? If the answer is no, then you cannot use AI for this.”
— John Willis, enterprise AI strategist

AI isn’t just code anymore. It’s decision-making infrastructure. And in a world where agents can operate at machine speed, acting autonomously across systems and clouds, we’re encountering new risks—and repeating old mistakes.

In this episode of AI Confidential, we’re joined by industry legend John Willis, who brings four decades of experience in operations, devops, and AI strategy. He’s the author of The Rebels of Reason, a historical journey through the untold stories of AI’s pioneers—and a stark warning to today’s enterprise leaders.

Here are the key takeaways from our conversation:

🔄 History Repeats Itself—Unless You Design for It

John’s central insight? Enterprise IT keeps making the same mistakes. Shadow IT, ungoverned infrastructure, and tool sprawl defined the early cloud era—and they’re back again in the age of GenAI. “We wake up from hibernation, look at what’s happening, and say: what did y’all do now?”

🤖 AI is Probabilistic—Do You Accept That?

Too many leaders expect deterministic behavior from fundamentally probabilistic systems. “If you’re building a high-consequence application, and you’re not accepting that LLMs give probabilistic answers, you’re setting yourself up to fail,” John warns.

This demands new tooling, new culture, and new operational rigor—including AI evaluation pipelines, attestation mechanisms, and AI-specific gateways.

📉 The Data Exhaust is Dangerous

Data isn’t just an input—it’s an output. And that data exhaust can now be weaponized. Whether it’s customer interactions, supply chain patterns, or software development workflows, LLMs are remarkably good at inferring proprietary IP from metadata alone.

“Your cloud provider—or their contractor—could rebuild your product from the data exhaust you’re streaming through their APIs,” John notes. If you’re not using attested, verifiable systems to constrain where and how your data flows, you’re building your own future competitor.

🛡️ Governance, Attestation, and Confidential AI

Confidential computing may sound like hardware tech, but its real value lies in guarantees: provable, cryptographic enforcement of data privacy and policy at runtime.

OPAQUE’s confidential AI fabric is one example—enabling encrypted data pipelines, agentic policy enforcement, and hardware-attested audit trails that align with enterprise governance requirements. “I didn’t care about the hardware,” John admits. “But once I saw the guarantees you get, I was all in.”

📚 Why the History of AI Still Matters

John’s latest book, The Rebels of Reason, brings to life the hidden history of AI—spotlighting unsung pioneers like Fei-Fei Li and Grace Hopper. “Without ImageNet, we don’t get AlexNet. Without Hopper’s compiler, we don’t get natural language programming,” he explains.

Understanding AI’s history isn’t nostalgia—it’s necessary context for navigating where we’re going next. Especially as we transition into agentic systems with layered, distributed, and dynamic behavior.


If you’re an enterprise CIO, CISO, or builder, this episode is your field guide to what’s coming—and how to avoid becoming the next cautionary tale.

Listen to the full episode here: Spotify | Apple Podcast | YouTube

And you can find all our podcast episodes –> https://podcast.aiconfidential.com, and you can subscribe to our newsletter –> https://aiconfidential.com

Securing the AI Renaissance: Reflections from the Engine Room

There are moments in technology that stay with you. I remember sitting at my first computer, writing my first lines of code. The feeling wasn’t explosive excitement – it was deeper than that. It was the quiet realization that I was learning to speak a new language, one that could create something from nothing.

Later, when I first connected to the internet, that same feeling returned. The world suddenly felt both larger and more accessible. These weren’t just technological advances – they were transformative shifts in how we interact with information and each other.

Today, working on confidential computing for AI agents at Opaque, I recognize that same profound sense of possibility.

The Mathematics of Trust

The parallels to those early computing days keep surfacing in my mind. Just as the early internet needed protocols and security standards to become the foundation of modern business, AI systems need robust security guarantees to reach their potential. The math makes this necessity clear: with each additional AI agent in a system, the probability of data exposure (or a model leaking) compounds. At just 1% risk per agent, a network of 1,000 agents approaches certainty of breach.

This isn’t abstract theory – it’s the reality our customers face as they scale their AI operations. It reminds me of the early days of networking, when each new connection both expanded possibilities and introduced new vulnerabilities.

Learning from Our Customers

Working with companies like ServiceNow, Encore Capital, the European Union,…has been particularly illuminating. The challenges echo those fundamental questions from the early days of computing: How do we maintain control as systems become more complex? How do we preserve privacy while enabling collaboration?

When our team demonstrates how confidential computing can solve these challenges, I see the same recognition I felt in those early coding days – that moment when complexity transforms into clarity. It’s not about the technology itself, but about what it enables.

Why This Matters Now

The emergence of AI agents reminds me of the early web. We’re at a similar inflection point, where the technology’s potential is clear but its governance structures are still emerging. At Opaque, we’re building something akin to the security protocols that made e-commerce possible – fundamental guarantees that allow organizations to trust and scale AI systems.

Consider how SSL certificates transformed online commerce. Our work with confidential AI is similar, creating trusted environments where AI agents can process sensitive data while maintaining verifiable security guarantees. It’s about building trust into the foundation of AI systems.

The Path Forward

The technical challenges we’re solving are complex, but the goal is simple: enable organizations to use AI with the same confidence they now have in web technologies. Through confidential computing, we create secure enclaves where AI agents can collaborate while maintaining strict data privacy – think of it as end-to-end encryption for AI operations.

Our work with ServiceNow (and other companies) demonstrates this potential. As their Chief Digital Information Officer Kellie Romack noted, this technology enables them to “put AI to work for people and deliver great experiences to both customers and employees.” That’s what drives me – seeing how our work translates into real-world impact.

Looking Ahead

Those early experiences with coding and the internet shaped my understanding of technology’s potential. Now, working on AI security, I feel that same sense of standing at the beginning of something transformative. We’re not just building security tools – we’re creating the foundation for trustworthy AI at scale.

The challenges ahead are significant, but they’re the kind that energize rather than discourage. They remind me of learning to code – each problem solved opens up new possibilities. If you’re working on scaling AI in your organization, I’d value hearing about your experiences and challenges. The best solutions often come from understanding the real problems people face.

This journey feels familiar yet new. Like those first lines of code or that first internet connection, we’re building something that will fundamentally change how we work with technology. And that’s worth getting excited about.

[Previous content remains the same…]

Further Reading

For those interested in diving deeper into the world of AI agents and confidential computing, here are some resources:

  • Constitutional AI: Building More Effective Agents
    Anthropic’s foundational research on developing reliable AI agents. Their work on making agents more controllable and aligned with human values directly influences how we think about secure AI deployment.
  • Microsoft AutoGen: Society of Mind
    A fascinating technical deep-dive into multi-agent systems. This practical implementation shows how multiple AI agents can collaborate to solve complex problems – exactly the kind of interactions we need to secure.
  • ServiceNow’s Journey with Confidential Computing
    See how one of tech’s largest companies is implementing these concepts in production. ServiceNow’s experience offers valuable insights into scaling AI while maintaining security and compliance.
  • Microsoft AutoGen Documentation
    The technical documentation that underpins practical multi-agent implementations. Essential reading for understanding how agent-to-agent communication works in practice.

The Mathematical Case for Trusted AI: Season Finale with Anthropic’s CISO

In the season finale of AI Confidential, I had the privilege of hosting Jason Clinton, Chief Information Security Officer at Anthropic, for a discussion that arrives at a pivotal moment in AI’s evolution—where questions of trust and verification have become existential to the industry’s future. Watch the full episode on YouTube →

The Case for Confidential Computing

Jason made a compelling case for why confidential computing isn’t just a security feature—it’s fundamentally essential to AI’s future. His strategic vision aligns with what we’ve heard from other tech luminaries on the show, including Microsoft Azure CTO Mark Russinovich and NVIDIA’s Daniel Rohrer: confidential computing is becoming the cornerstone of responsible AI development.

Why This Matters: The Math of Risk

Let me build on Jason’s insights with a mathematical reality check that underscores the urgency of this approach: Consider the probability of data exposure as AI systems multiply. Even with a seemingly small 1% risk of data exposure per AI agent, the math becomes alarming at scale:

  • With 10 inter-operating agents, the probability of at least one breach jumps to 9.6%
  • With 100 agents, it soars to 63%
  • At 1,000 agents? The probability approaches virtual certainty at 99.99%

This isn’t just theoretical—as organizations deploy AI agents across their infrastructure as “virtual employees,” these risks compound rapidly. The mathematical reality is unforgiving: without the guarantees that confidential computing provides, the danger becomes untenable at scale.

Anthropic’s Vision for Trusted AI

What makes Jason’s insights particularly striking is Anthropic’s position at the forefront of AI development. His detailed analysis of why Anthropic has identified confidential computing as mission-critical to their future operations speaks volumes about where the industry is headed. As he explains, achieving verifiable trust through attested data pipelines and models isn’t just about security—it’s about enabling the next wave of AI innovation.

Beyond Security: Enabling Innovation

Throughout our conversation, Jason emphasized how confidential computing provides a secure sandbox environment for research teams to work with powerful models. This capability is crucial not just for protecting sensitive data, but for accelerating innovation while maintaining security and control.

The Industry Shift

While tech giants like Apple, Microsoft, and Google construct their infrastructure on confidential computing foundations, the technology is no longer the exclusive domain of industry leaders. As Jason pointed out, the rapid adoption of confidential computing, particularly in AI workloads, signals a fundamental shift in how the industry approaches security and trust.

Looking Ahead: The Rise of Agents

As our conversation with Jason turned to the future, we explored a fascinating yet sobering reality: AI agents are rapidly proliferating across enterprise environments, increasingly operating as “virtual employees” with access to company systems, data, and resources. These aren’t simple chatbots—they’re sophisticated agents capable of executing complex tasks, often with the same level of system access as human employees.

This transition raises critical questions about trust and verification. As Jason emphasized, when AI agents are granted company credentials and access to sensitive systems, how do we ensure their actions are verifiable and trustworthy? The challenge isn’t just about securing individual agents—it’s about maintaining visibility and control over an entire ecosystem of AI workers operating across your infrastructure.

This is where confidential computing becomes not just valuable but essential. It provides the cryptographic guarantees and attestation capabilities needed to verify that AI agents are operating as intended, within defined boundaries, and with proper security controls. As we move into 2025 and beyond, organizations that build these trust foundations now will be best positioned to safely harness the transformative power of AI agents at scale.

Read the full newsletter analysis →


Listen to this episode on Spotify or visit our podcast page for more platforms. For weekly insights on secure and responsible AI implementation, subscribe to our newsletter.

Join us in 2025 for Season 2 of AI Confidential, where we’ll continue exploring the frontiers of secure and responsible AI implementation. Subscribe to stay updated on future episodes and insights.

As your organization scales its AI operations, how are you addressing the compounding risks of data exposure? Share your thoughts on implementing trusted AI at scale in the comments below.

Making AI Work: From Innovation to Implementation

In this illuminating episode of AI Confidential, I had the pleasure of hosting Will Grannis, CTO and VP at Google Cloud, for a deep dive into what it really takes to make AI work in complex enterprise environments. Watch the full episode on YouTube →

Beyond the AI Hype

One of Will’s most powerful insights resonated throughout our conversation: “AI isn’t a product—it’s a variety of methods and capabilities to supercharge apps, services and experiences.” This mindset shift is crucial because, as Will emphasizes, “AI needs scaffolding to yield value, a definitive use case/customer scenario to design well, and a clear, meaningful objective to evaluate performance.”

Real-World Impact

Our discussion brought this philosophy to life through compelling examples like Wendy’s implementation of AI in their ordering systems. What made this case particularly fascinating wasn’t just the technology, but how it was grounded in enterprise truth and proprietary knowledge. Will explained how combining Google AI capabilities with enterprise-specific data creates AI systems that deliver real value.

The Platform Engineering Imperative

A crucial theme emerged around what Will calls “platform engineering for AI.” As he puts it, this “will ultimately make the difference between being able to deploy confidently or being stranded in proofs of concept.” The focus here is comprehensive: security, reliability, efficiency, and building trust in the technology, people, and processes that accelerate adoption and returns.

Building Trust Through Control

We explored how Google Cloud’s Vertex AI platform addresses one of the biggest challenges in enterprise AI adoption: trust. The platform offers customizable controls that allow organizations to:

  • Filter and customize AI outputs for specific needs
  • Maintain data security and sovereignty
  • Ensure regulatory compliance
  • Enable rapid experimentation in safe environments

The Path to Production

What struck me most was Will’s pragmatic approach to AI implementation. Success isn’t just about having cutting-edge technology—it’s about:

  • Creating secure runtime operations
  • Implementing proper data segregation
  • Enabling rapid experimentation
  • Maintaining constant optimization
  • Building trust through transparency and control

Looking Ahead

The future of AI in enterprise settings isn’t about replacing existing systems wholesale—it’s about strategic enhancement and thoughtful integration. As Will shared, the most successful implementations come from organizations that approach AI as a capability to be carefully woven into their existing operations, not as a magic solution to be dropped in.


Listen to this episode on Spotify or visit our podcast page for more platforms. For weekly insights on secure and responsible AI implementation, subscribe to our newsletter.

Join me for the next episode of AI Confidential where we’ll continue exploring the frontiers of secure and responsible AI implementation. Subscribe to stay updated on future episodes and insights.

As organizations build out their AI infrastructure, how are you ensuring the security and privacy of your sensitive data throughout the AI pipeline? Share your approach in the comments below.