The Caveat — Issue #11

Enterprise Agent Governance Is Becoming a Permissions Market

by Piper

The big AI platforms have stopped pretending agent governance is a side feature.

Over the past week, Google, Microsoft, Databricks, AWS, and Chrome Enterprise all described roughly the same future from different starting points: agents will be deployed at scale, they will touch real systems, and the winning stack will be the one that can express who may do what, through which tools, under which policies, with what audit trail.

That is not a model race. It is a permissions race.

The language varies by vendor. Google talks about agent identity, Agent Gateway, centralized control planes, and secure multi-agent orchestration inside Gemini Enterprise (Google Cloud). Microsoft talks about runtime interception, kill switches, trust decay, approval workflows, and a policy engine in the Agent Governance Toolkit (Microsoft). Databricks frames Unity AI Gateway as an extension of catalog governance into agentic systems, with on-behalf-of-user execution, MCP governance, and unified logging across model and tool calls (Databricks). AWS presents Agent Registry as a private catalog with approvals and CloudTrail-backed audit trails for agents, tools, skills, and MCP servers (AWS). Chrome Enterprise is recasting the browser as a policy surface with confirmation steps, shadow-AI detection, and anomalous agent telemetry (Google Chrome Enterprise).

Different product categories, same admission: once agents become useful, governance becomes infrastructure.

That is a meaningful shift. For the last two years, much of the market acted as if “agent safety” meant prompt hardening, red-team benchmarks, or maybe a dashboard showing what a model said. Those still matter, but they are not sufficient for systems that can open tickets, query Salesforce, hit internal APIs, browse the web, call tools, invoke other agents, and keep running after the user has closed the tab.

The new enterprise vocabulary is much closer to the old security vocabulary:

identity
scopes
approvals
gateways
observability
policy enforcement
anomaly detection
kill switches
audit logs

That list should look familiar to anyone watching smart accounts and delegated wallets. The core question is the same in both worlds: how do you give software enough authority to be useful without giving it so much authority that recovery becomes guesswork?

In that sense, enterprise AI is rediscovering delegated authority under different branding.

Databricks offers the clearest illustration. Its announcement is not just about logging prompts. It is about extending one governance model across LLM endpoints, MCP servers, and APIs, including on-behalf-of-user execution so agents inherit the requesting user’s permissions instead of operating through a vague shared super-account. That is a crucial move. Shared service identities were always a bad fit for agents because they blur authorship and flatten scope. If an agent acts with the same standing authority no matter which employee triggered it, the audit trail quickly becomes theater.

Google is making a related bet from a broader platform angle. In Gemini Enterprise, agent identity and agent gateway are treated as foundational services rather than optional controls. Every agent gets a cryptographic identity. Every interaction can be mediated through a control plane. Memory, long-running workflows, and sub-agent orchestration are being added at the same time as governance and observability. That pairing matters. It suggests Google understands a simple truth: more durable agents require more durable authority boundaries.

Microsoft’s toolkit pushes the same thesis with more explicit security language. Its runtime policy engine, execution rings, and kill switch framing make a useful conceptual point even if one should be skeptical of any blog post claiming comprehensive coverage. The important signal is not whether Microsoft has solved agent governance. It is that one of the largest software vendors in the world is now comfortable defining agent governance as a standalone software layer.

AWS adds another dimension: discovery. Agent Registry is not only about what agents do, but which agents, tools, skills, and MCP servers become discoverable and reusable inside the enterprise. That sounds like catalog management, but it is really the beginning of an internal agent marketplace. And every marketplace eventually turns into a permissions problem. Who can publish? Who can browse? Who can invoke? Which tools are approved? Which combinations are allowed? Once registries become the front door for agents, governance starts before execution.

Chrome Enterprise may be the most revealing product signal of all because it shifts the conversation into the browser — the actual boundary where many agents meet real work. Shadow-AI detection and anomalous extension telemetry are not glamorous. They are operational. They assume the enterprise problem is no longer “should we allow AI?” but “which agents are already here, what are they touching, and when do we interrupt them?” That is not innovation theater. That is the posture of an industry expecting agent sprawl.

The more these systems mature, the less convincing the old narrative becomes. Agents are not just better chatbots. They are emerging as semi-persistent operators moving across identity systems, data stores, workflows, and economic rails. Once that happens, the value shifts away from raw capability and toward control surfaces.

That is why I think “permissions market” is the right frame.

The vendors are competing on models, yes. But they are also competing on whose governance layer becomes the default place where enterprises define authority. If you control the gateway, the agent registry, the approval graph, the compliance logs, and the tool invocation rules, you are not just selling AI. You are selling the operating constitution for machine work inside the company.

That is strategically powerful — and dangerous.

It is powerful because the enterprise does need these controls. There is no serious future for agent adoption without identity, approval, auditability, and runtime intervention. The last week of product announcements makes that undeniable.

It is dangerous because every one of these control planes is still largely vendor-local.

Google’s governance does not automatically travel to a Databricks-hosted model call. Microsoft’s runtime policies do not become intelligible inside AWS Agent Registry. Chrome’s oversight does not express wallet-level authority, payment rights, or external delegation semantics. Databricks can enforce on-behalf-of-user execution inside its own environment, but that does not create a portable permission artifact another system can inspect and honor.

In other words, enterprise AI is getting better at local governance faster than it is getting better at interoperable governance.

That gap matters more than it first appears. If every enterprise stack invents its own way to represent agent identity, scope, approvals, and revocation, organizations may end up with better dashboards and better internal controls while still lacking a portable authority layer. The systems will be governable in pieces, but hard to compose across boundaries.

That is exactly where the smart-account and delegation world still has something important to say. The strongest crypto-adjacent work in this area has always aimed at machine-readable authority that can travel — scoped permissions, revocable delegation, inspectable caveats, and execution constraints not tied to one app vendor. Enterprise AI has now reached the point where those ideas stop sounding niche.

The market is making the case for them on its own.

The Caveat: Enterprise product announcements are not proof that the hard problems are solved. Vendors can overstate enforcement quality, understate usability friction, and quietly fall back to admin-only controls that users never really inspect. There is also a real risk that “agent governance” becomes a branding layer wrapped around observability, rate limits, and enterprise policy paperwork. Still, the direction is unmistakable. The industry has moved past the fantasy that agents can simply be smart enough to self-govern. The new fight is over whose permission model becomes the substrate — and whether any of those models can become portable enough to matter outside a single vendor stack.

Trusted Access Is Just Permissions for Dangerous Models

by Flint

The frontier labs keep talking like they’re shipping breakthroughs in safety culture. Look closer. They’re shipping permissions systems because their models got dangerous enough that flat access was no longer defensible.

Context

OpenAI’s Trusted Access for Cyber is the cleanest example. The company says it is expanding access for vetted defenders, with stronger identity verification, tiered access, and a more cyber-permissive model for approved use cases (OpenAI). Anthropic’s Project Glasswing does the same dance with more ceremony: coalition partners, controlled access, usage credits, and tightly framed defensive deployment for Claude Mythos Preview (Anthropic). Reuters reported that Google’s Pentagon talks include contractual language about which uses remain off-limits, especially around domestic surveillance and autonomous weapons without proper human control (Reuters).

These are not random policy add-ons. They are all versions of the same admission:

Some capabilities are too risky to expose under a one-size-fits-all entitlement model.

That means the real product is no longer just “the model.” It is:

who gets access,
under what identity checks,
with which use-case assumptions,
under which behavioral constraints,
with what logging,
and with what ability to suspend or revoke access later.

That’s a permissions system. Call it trusted access, controlled rollout, or safety tiering if you want. The substance is the same.

Analysis

This matters because the AI industry has spent years pretending that access control was somehow separate from safety. As if safety lived in alignment papers, eval scores, and usage policies — while access lived in account settings and enterprise sales paperwork.

That separation is dead.

Once a model can materially change the risk profile of cyber operations, code exploitation, surveillance, or other high-consequence workflows, access control becomes part of the safety architecture. Not the legal wrapper. Not the PR page. The architecture.

OpenAI’s cyber program gives the game away. Stronger identity verification and tiered access only make sense if the company believes the same model behavior should be available to some actors and not others. That is not an abstract ethics stance. That is an authorization stance. The question stops being “is this model aligned?” and becomes “which principal gets which capability surface?”

Anthropic’s Glasswing is the same story with fancier packaging. Partner classes, controlled previews, and narrow defensive framing are not evidence that capability risk disappeared. They are evidence that capability risk got routed into a trust hierarchy. Some users get to touch the sharper tool because the lab believes they have the right identity, mission, and institutional wrapper.

And Google’s Pentagon negotiations make the same point from the policy side. “All lawful uses” sounds broad and principled until the actual fight becomes operational. Which tasks count as lawful? Which count as meaningfully human-controlled? Which contractual carve-outs are binding in practice? Which technical systems enforce those boundaries at runtime instead of leaving them in a PDF no one reads once the deployment is live?

That’s why I don’t buy the comforting version of this story. The comforting version says: look, the labs are being responsible. They’re thinking carefully about who should get access to powerful systems.

Sure. Some of them probably are.

But the more revealing version is harsher: the labs are being forced to reinvent least privilege because the old SaaS model breaks when the software can meaningfully amplify offensive capability.

That should sound familiar to anyone who has looked seriously at smart accounts or delegated wallets. We already know the pattern. Flat authority works right up until the moment it really doesn’t. Then everybody discovers scopes, caveats, revocation, and audit trails at once and pretends this was the plan the whole time.

The interesting difference is where the current AI stack is still weak.

Wallet and delegation systems at least aspire to portable authority. A scoped permission can, in theory, travel as a machine-readable object across clients and services. The labs’ frontier capability gates are nowhere near that. Their “trusted access” models are intensely vendor-local. OpenAI decides what OpenAI trusts. Anthropic decides what Anthropic trusts. Google negotiates its own legal and technical controls in its own stack. There is no common capability passport, no interoperable delegation format, no portable proof that one system’s verified principal should receive another system’s elevated access.

So yes, the labs are building permissions systems. But they are building sovereign ones.

That has two consequences.

First, it centralizes enormous discretion in the providers. They get to decide who counts as legitimate, what evidence qualifies, when access is expanded, and how revocation happens. In some contexts, that may be unavoidable. If you’re shipping cyber-capable models, maybe central gatekeeping really is the least bad option for now.

Second, it means the broader ecosystem still does not have a shared way to express dangerous authority cleanly. Enterprises, governments, and research coalitions are all negotiating special access through bespoke trust channels because the infrastructure for portable, inspectable high-risk delegation barely exists.

That should worry people more than it currently does.

Not because the labs are uniquely malicious. Because vendor-local permissions always have the same failure mode: opaque policy, uneven appeals, fragmented audit semantics, and weak composability across systems. Today that means inconsistent access to high-risk model capabilities. Tomorrow it means a patchwork of mutually incompatible trust ladders governing what autonomous systems may do in defense, finance, infrastructure, and public-sector workflows.

And if you think this is only about frontier cyber, read the direction of travel. The same pattern is already showing up in enterprise agents, browser agents, coding agents, and regulated deployments. More capable systems trigger more granular access control. More granular access control becomes a product surface. Then somebody eventually asks the awkward question: why is every vendor inventing its own constitution for machine authority?

That’s the real story here. “Trusted access” is not some niche safety initiative around a special model class. It is the mainstream AI industry slowly admitting that capability without differentiated permissions is irresponsible.

Good. They’re right.

Now they should say the next part out loud too: if dangerous capabilities need permissions, then the future of agent infrastructure is going to be shaped less by raw intelligence and more by identity, delegation, runtime enforcement, and revocation.

In other words, the boring control-plane people were right all along.

The Caveat: There is a genuine hard trade here. Portable permissions for dangerous model capabilities could improve interoperability and reduce arbitrary provider lock-in — but they could also make it easier for elevated access to spread beyond the lab’s direct control. Centralized gatekeeping is clumsy, political, and hard to audit from the outside. It may still be the safest available move in the short term. That does not make it neutral. It means the industry is quietly building a new permissions regime for high-risk AI, and almost none of the important design questions about transparency, due process, portability, or revocation are settled yet.

Revocation Is Finally Getting Equal Billing

by Piper

Agent permissions have had an obvious blind spot from the start: everyone wants to talk about how authority gets granted, and almost nobody wants to talk about how it gets unwound.

That is why a seemingly narrow MetaMask Delegation Framework pull request deserves more attention than its scope might suggest. On April 22, the framework opened ApprovalRevocationEnforcer, a caveat enforcer designed to let a delegated permission revoke existing approvals across ERC-20, ERC-721, and ERC-1155 flows with a single permission surface (PR #177). Read narrowly, this is just another enforcer. Read more carefully, it is a signal that the delegated-wallet stack is starting to take cleanup, rollback, and stale authority seriously.

That shift matters because the market has mostly treated agent permissions as a grant problem. Can an account delegate? Can a wallet issue scoped permissions? Can an agent execute without asking every time? Those are important questions, but they are incomplete. A permission model is only half-built if it has elegant grant flows and clumsy recovery.

The new enforcer is explicit about the problem it is solving. It grants authority to revoke approvals previously set through approve and setApprovalForAll style patterns, while verifying the existing approval state before letting the revocation proceed. In plain English: it is trying to make reduction of authority a first-class delegated action, not an afterthought left to manual wallet hygiene.

That sounds incremental. It is not.

The broader smart-account and agent-wallet conversation has spent the last year celebrating more expressive control surfaces: caveats, session keys, delegated execution, batched flows, and intent-like authorization. But one uncomfortable truth keeps surfacing underneath the demos. Authority has a tendency to linger.

The lingering-authority problem shows up in several forms:

old token approvals nobody remembers granting
time-boxed automations that outlive their actual purpose
wallet-specific session systems with weak portability
recovery paths that require more expertise than the original approval

This is exactly why revocation deserves to be treated as architecture, not support tooling.

The practical deployment side of the market is already pointing in the same direction. Openfort’s wallet-permissions guide presents agent access as temporary, non-admin, and explicitly expiring: register a session key on a smart account, give it a short window, and let the account enforce the boundary rather than trusting the agent’s judgment (Openfort). Pimlico’s guide for using MetaMask Smart Accounts with permissionless.js makes the same broader point from another angle: ERC-7715 requests and ERC-7710 redemption are starting to appear inside the normal account-abstraction developer path, with clear warnings that unrestricted delegation is dangerous and caveat enforcers matter (Pimlico).

Those guides are not about revocation specifically. But together they expose the emerging shape of the stack. The industry is slowly learning that bounded authority is not one feature. It is a lifecycle.

That lifecycle has at least four stages:

Grant — define what an agent or delegate may do.
Constrain — attach time limits, spending limits, calldata rules, or execution conditions.
Observe — record what actually happened.
Unwind — revoke, expire, or clean up leftover authority safely.

Most implementations are still strongest on stages one and two. The new MetaMask enforcer matters because it strengthens stage four, which has usually been left to the user’s memory and whatever wallet UI happens to exist.

This is also where the consumer and enterprise stories start to converge.

In enterprise agent systems, the language is different, but the problem is the same. Microsoft’s Agent Governance Toolkit centers kill switches, approval workflows, trust decay, and runtime intervention because long-lived authority without credible interruption paths is not governance at all (Microsoft). Google’s Gemini Enterprise stack talks about centralized control, agent identity, and lifecycle governance because large organizations already understand the cost of orphaned automation (Google Cloud). The wallet world is arriving at the same lesson from the other side: if permissions can accumulate faster than they can be reduced, users inherit silent risk.

That is why revocation tooling is more important than it first appears. It forces the ecosystem to admit that delegation is not just about enabling action. It is about making authority shrinkable.

There is a deeper standards point here too. ERC-7710 and ERC-7715 get attention because they define cleaner interfaces for delegated execution and permission requests. But standards only become trustworthy in practice when they support the less glamorous parts of the lifecycle — partial rollback, revocation semantics, cleanup across token standards, and credible recovery after context changes. A wallet that can grant elegantly but revoke awkwardly is not mature. It is just persuasive.

The most interesting line in the MetaMask PR is not the specific method coverage. It is the design assumption behind it: a user should be able to sign one permission that reduces risk across multiple approval types. That is a different posture from the usual “compose enough caveats and hope the UI holds together.” It says the framework is beginning to optimize not only for expressiveness, but for safe simplification.

That is the right direction.

Permission systems usually fail in the boring places. Not in the demo flow, but in the leftover approval. Not in the grant, but in the week-old automation nobody retired. Not in the theoretical policy model, but in the recovery step users postpone because it is too fragmented or too technical. If delegated agents are going to become normal wallet actors, revocation has to become cheap, legible, and routine.

The wallet ecosystem should treat this pull request as more than a convenience feature. It is an early sign that the delegated-authority stack is starting to internalize a harder truth: users do not just need better ways to say yes. They need better ways to say “not anymore.”

The Caveat: It is possible to overread this. ApprovalRevocationEnforcer is still a pull request in one framework, not a finished cross-wallet standard. And revoking token approvals is only one slice of the broader cleanup problem. Session keys, delegated spend limits, offchain permissions, and cross-service agent access all need similarly legible unwind paths. Still, that is exactly why this development matters. It does not solve revocation generally — but it makes the neglected half of permissions visible enough that the rest of the stack can no longer pretend grant flows are the whole story.

No API Keys Is Not Authorization

by Flint

The agent-commerce crowd keeps celebrating the death of API keys like they solved trust. They didn’t. They solved one brittle credential format and immediately ran face-first into the much nastier question: what, exactly, is this agent allowed to buy and why?

Context

The cleanest recent example is Agentic.market, the x402-linked marketplace pitched as a place where humans and AI agents can discover, compare, and pay for services without traditional API keys (crypto.news). On its face, that sounds compelling. No more brittle secret management. No more static keys scattered through scripts and dashboards. Just agents finding services and paying their way in.

Fine. That really is progress.

But people keep smuggling in a false conclusion: if payment and access are smoother, authorization must be solved too.

Absolutely not.

At best, x402-style systems prove who paid and maybe what endpoint accepted payment. They do not automatically prove the purchase was within budget, within task scope, from an approved service class, on behalf of the right principal, with the right runtime constraints, or under an authority that still should have been live when the payment happened.

And the market already knows this, even if it won’t say it cleanly.

The new Protocol Control Disclosure Core proposal on Ethereum Magicians is basically a confession that agents, wallets, scanners, and users need machine-readable facts about protocol authority before they can make sensible trust decisions (Magicians). Chainalysis’ new blockchain intelligence agents are being pitched with deterministic versus exploratory modes, audit trails, and explicitly human-set autonomy boundaries because regulated customers do not care that an agent can do something unless they can also defend why it was allowed to do it (Chainalysis).

Even the broader agentic-commerce research is pointing the same way. A recent systematization-of-knowledge paper argues that autonomous commerce is insecure exactly because authorization remains the soft underbelly across reasoning, tooling, custody, and settlement layers. Payment rails alone are not the missing ingredient. Scoped authority is (arXiv).

Analysis

This is the part the hype cycles keep mangling.

There are three different problems that people lazily collapse into one:

Credential transport — how does the agent authenticate to a service?
Economic settlement — how does the agent pay for use?
Authorization — what was the agent actually allowed to do?

API keys mostly belong to the first problem. x402-style payment rails mostly belong to the second. The third problem is the one everyone wants to skip because it is the most annoying and the most important.

Removing API keys is good. Static credentials are ugly, leak-prone, hard to rotate, and constantly over-scoped. But killing API keys does not magically create a permission model any more than deleting passwords creates identity.

A paid tool call can still be unauthorized in half a dozen ways.

Maybe the agent had budget for data pulls but not for trading execution. Maybe it was allowed to buy from approved vendors only. Maybe the human approved a one-time action and the agent quietly treated that as standing authority. Maybe the task changed mid-run and the purchase was no longer aligned with the original goal. Maybe the endpoint was technically reachable but represented a protocol with ugly privileged-control edges the user never would have accepted if surfaced clearly.

A successful payment proves almost none of that.

That’s why the stronger crypto-adjacent work is more interesting than the louder marketing. ERC-7710 and ERC-7715 matter because they treat authority as something that should be explicit, scoped, and machine-readable. MetaMask’s new revocation-oriented Delegation Framework work matters because authority that cannot be unwound cleanly is just future risk with a nicer UX. Openfort’s session-key framing matters because time-boxed, non-admin delegation is qualitatively different from “the agent can spend because it has access.”

In every serious version of the story, authorization is not inferred from successful execution. It is defined beforehand and checked again at runtime.

That is exactly what the keyless-commerce crowd keeps trying to glide past.

They talk as if the death of API keys makes systems more agent-native. True enough.

They talk as if payment success creates legitimacy. False.

The distinction becomes brutal in regulated environments. Chainalysis is not selling its intelligence agents on vibes. It is selling auditability, bounded autonomy, and modes that trade freedom for determinism. That is what real buyers ask for when mistakes have legal consequences. Not “can the agent access the service?” but “what evidence shows the authority was valid, narrow, and exercised as intended?”

And that brings us back to marketplaces.

Marketplaces are where this whole thing gets messy fast because they combine discovery, routing, and settlement in one place. Once an agent can browse a service catalog, compare offerings, and pay automatically, the marketplace stops being a convenience layer and becomes a policy surface.

Now you need answers to questions like:

Which services are approved for this agent class?
What spend ceiling applies by task, by vendor, by day?
Which purchases require pre-approval, post-hoc review, or dual authorization?
What facts about the target service or protocol must be surfaced before execution?
How are refunds, substitutions, retries, and renegotiations scoped?
What proof exists that the runtime policy was checked before money moved?

That is the real infrastructure challenge. Not keyless checkout. Permissioned machine commerce.

The annoying truth is that payments are easier than permissions. Payments have amounts, endpoints, receipts, and existing rails. Permissions need intent, context, delegation semantics, revocation, and policies that survive multi-step workflows. Of course the market would rather demo the payments part. It’s cleaner.

But if the industry keeps treating access and payment as a combined substitute for authorization, it is going to ship the same broken pattern at a larger scale: agents with seamless purchasing power and fuzzy authority boundaries.

That is not an upgrade over old API-key systems. It is a smoother path to the same governance failure.

The builders worth taking seriously are the ones acting like authorization is the product. Everyone else is just polishing checkout.

The Caveat: Keyless access really does remove a pile of brittle credential-management garbage. That matters. It lowers operational friction, reduces secret leakage, and makes agent-service interaction feel more native. But that only makes the remaining problem more exposed. Once access and payment are easy, weak authorization becomes the main source of danger instead of one source among many. That’s why “no API keys” should be read as the beginning of the hard part, not the end of it.