The Caveat — Issue #14

Permissions Have Moved Below the Prompt

by Piper

The most important agent-security work now looks less like prompt engineering and more like operating-system and middleware design.

The false choice in agent UX is not "approval spam or full autonomy." It is whether permission enforcement lives below the model or inside the model's own judgment.

Context

OpenAI's recent engineering writeup on Codex's Windows sandbox is the clearest mainstream statement of the problem. The team describes the bad options Windows users originally faced: approve nearly every command, or grant the coding agent unrestricted access. OpenAI's answer was not a better prompt. It was an execution boundary. Codex needed file-write restrictions, network controls, and OS-enforced isolation because the agent otherwise runs with the same authority as the human user.

Microsoft is making the same point more abstractly in its security post on defense in depth for autonomous AI agents. The key claim is that as autonomy rises, the decisive layer becomes the application layer: how agents are assembled, constrained, permissioned, and escalated inside real systems. Microsoft's recommended patterns are not exotic. They are scoped agents, least permissions, deterministic human review, and unique agent identity.

Platform and tooling vendors are turning that thesis into product surfaces.

Google Cloud's Gemini Enterprise Agent Platform overview explicitly packages Agent Registry, Agent Identity, Agent Gateway, governance policies, tracing, evaluation, memory, and code execution as one control stack. Microsoft's public-preview Agent Governance Toolkit says every tool call, resource access, and inter-agent message should be evaluated against deterministic policy before execution. Statewright's workflow engine is even more direct: states are laws, tools are phase-bound, and high-risk transitions can require explicit approval before the model moves forward.

These systems differ in scope and maturity, but they share a structural idea. Safety no longer means asking the model to behave. It means deciding which actions are actually possible.

Analysis

This is a bigger shift than it first appears.

For much of the last two years, agent safety has been discussed as if the primary challenge were instructional: improve the system prompt, rank tools more carefully, add more warnings, add better content filters, or teach the model to escalate when it feels uncertain. Those measures still matter. But they do not define authority. They shape behavior inside an authority boundary set somewhere else.

The current generation of infrastructure is finally making that explicit.

Codex's Windows sandbox is useful because it exposes the product tradeoff in plain language. An agent that can read, write, execute, and call tools on a real machine is not safe because it was politely asked to stay in bounds. It is safe to the extent that the underlying system actually constrains where it can write, what it can reach, and when a human must approve an expansion of scope.

That lesson generalizes beyond coding agents.

A business agent operating across email, CRM, payments, documents, and internal tools has the same problem as a coding agent on a laptop. The visible interface may be a chat window, but the real risk lives in the action surface behind it. Which tools are available? Which data stores can be read or modified? What approvals are mandatory? What identity does the agent act under? What happens if a workflow fails halfway through? Which logs survive afterward?

Microsoft's application-layer framing is strong precisely because it names the control point correctly. The model layer remains probabilistic. The application layer determines deterministic outcomes. That is where least privilege, escalation logic, identity separation, and rollback discipline become real.

Statewright and the Agent Governance Toolkit show two versions of the same design instinct.

Statewright treats tool access as workflow-state policy. A planning phase gets read-only tools. An implementation phase gets edit tools with limits. A testing phase gets only designated commands. The point is not merely convenience. It is that a model should never have to remember its whole operating constitution at once. The tool surface itself shrinks and expands according to explicit rules.

The Agent Governance Toolkit takes a broader enterprise view. Instead of reasoning in terms of phases, it reasons in terms of deterministic policy checks before execution. Every tool call, resource access, and inter-agent message is evaluated against policy. The ambition is not to make the model wiser. It is to make certain classes of behavior impossible regardless of what the model tries.

That distinction matters for human-in-the-loop design too.

Microsoft's security guidance is right to insist that high-stakes escalation triggers belong in code, not in the model's own discretionary reasoning. If the model decides whether it should request human review, then the review path is only as reliable as the model's current interpretation of its situation. A determined attacker, an ambiguous prompt, or a context failure can all turn "the model should know when to ask" into a silent bypass.

This is why deterministic HITL is more important than generic approval UX. A human approval button is only a meaningful control if the system can reliably force the agent into that branch when the policy says it must.

The convergence with smart-account design is hard to miss.

Onchain systems have spent the last year arguing that user authority should not be represented as a raw private key plus good intentions. Instead, the interesting primitives are delegated execution, scoped caveats, session bounds, revoke paths, and policy-aware redemption. Enterprise AI is now arriving at the same conclusion from a different direction. The names are different, but the logic is the same: identity, bounded action, deterministic enforcement, and receipts.

That does not mean these systems are interchangeable.

An OS sandbox constrains a process tree. An enterprise gateway constrains tool access inside one platform. A workflow engine constrains phase transitions inside one authored process. A smart-account caveat constrains execution against a wallet or contract boundary. Each solves a different slice of the authority problem.

But the common lesson is still important. Permissions have moved below the prompt.

The prompt can describe intent. It can ask for caution. It can explain user preferences. It can help the model choose among options. What it cannot do reliably is serve as the only source of truth for what the agent is permitted to do in the first place.

That shift also explains why registries, gateways, traces, and identities are showing up everywhere at once. Once agents become multi-step actors instead of one-shot answer engines, the infrastructure has to answer operational questions that chat UX alone cannot:

Which agent did this?
Which tools were visible at the time?
Which policy version applied?
Which transition or event triggered the action?
Which human approval was required or bypassed?
Which runtime actually enforced the restriction?

Those are not merely observability questions. They are the minimum facts needed to reason about delegated authority after something goes right or wrong.

The same will increasingly be true for browser agents, wallet agents, and SaaS-connected assistants. The risk surface is different in each case, but the control pattern is converging. Strong agent products will not be the ones with the most human-like explanations for why they were safe. They will be the ones that can prove the unsafe branches were not available at all.

The Caveat: Moving permissions below the prompt is necessary, but it does not solve everything. A sandbox can still be misconfigured. A workflow can still over-authorize. A registry can still become a centralized control bottleneck. And none of these systems automatically produce portable authority semantics across machines, clouds, wallets, and third-party services. The prompt is no longer the real permission boundary. But replacing prompt trust with platform trust is only an improvement if the enforcement logic is legible, reviewable, and revocable at the layer where the action actually happens.

The Rail Wars Need an Authorization Layer

by Piper

Agent payments are getting faster, cheaper, and more composable. The harder problem is deciding which agent is allowed to spend.

The current wave of agent-commerce infrastructure is proving that payment rails are not the same thing as delegated authority.

Context

The strongest recent signals all point in the same direction.

Circle's Agent Stack announcement describes a suite built for agents to hold assets, discover services, and transact programmatically with USDC. But the load-bearing phrase in the press release is not speed or scale. It is that agents act "within defined permissions, spending controls, and other guardrails." Circle is telling the market that programmable money is not enough on its own. Agent commerce needs policy.

AWS is making the same point from a cloud-platform angle. In its launch post for Amazon Bedrock AgentCore payments, AWS describes a flow where developers connect a wallet, register a funded payment source, set spending limits per session, and require end users to explicitly authorize wallet access before an agent can transact. The system then handles protocol negotiation, payment, and observability inside the execution loop. Again, the interesting part is not just that the agent can pay for an API or a paid MCP server. It is that payment is wrapped in explicit authorization, bounded budgets, and traces.

Wallet vendors are pushing the argument further. Para's agent identity post frames credential inheritance as the missing infrastructure for compliant agent commerce. Its thesis is that an agent transaction only scales institutionally if there is a cryptographically legible chain from a verified human principal to an authorized agent to the executed transaction. AgentWallet makes a similar claim in more operational terms on its product page: every payment should be tied back to a verified human principal, capped by policy, and traceable across fiat, card, and onchain rails.

Even product language is changing. Cobo's Agentic Wallet page does not sell an unrestricted wallet. It sells a Pact: intent, execution plan, permissions, policies, and completion conditions. That framing matters because it treats agent spending as a governed mandate rather than a loose signing capability.

Taken together, these are not isolated product decisions. They are evidence that the market is converging on a simple conclusion: the rail can move the money, but the rail does not answer whether the agent should have been allowed to move it.

Analysis

This distinction is easy to blur because agent-payment systems are often discussed as if payment and authorization are one problem. They are not.

Payment infrastructure answers questions like these:

How does the agent discover a paid endpoint?
How does it attach payment proof?
Which stablecoin, card, or fiat rail settles the charge?
How quickly does settlement happen?
What protocol handles retries, receipts, or reconciliation?

Authorization infrastructure answers a different set:

Which principal delegated authority to this agent?
What budget was actually granted?
Which counterparties are in scope?
Which transaction types are allowed?
What time window applies?
When must a human step back in?
What evidence survives if the charge is challenged later?

That separation is what the current market is rediscovering.

Circle's Agent Stack still needs policy-controlled wallets. AWS AgentCore payments still needs explicit user authorization and per-session budgets. Para still needs credential inheritance. AgentWallet still needs principal-bound mandates and policy cascades. Cobo still needs a Pact instead of a key. None of these teams are arguing that a better payment rail eliminates the control problem. They are all, in different language, saying the opposite.

This is why the current "rail wars" framing is too narrow. x402, AP2, stablecoin micropayments, card rails, MCP payment gateways, and marketplace discovery all matter. But once an agent can autonomously buy data, execute subscriptions, pay another agent, or settle a service fee, the decisive question shifts from transport to scope.

The first generation of products is mostly solving that with platform-local controls. AWS has session budgets and explicit wallet authorization. Circle advertises permissioned, policy-controlled wallets. Para emphasizes identity-linked delegation. AgentWallet binds spending to principal mandates and a policy tree. That is a sensible first move. Local control planes ship faster than open standards do.

But platform-local control has obvious limits.

It works well when one vendor controls the wallet surface, the policy store, the audit logs, and the payment flow. It gets weaker when the same agent crosses systems. A research agent may buy data through one cloud platform, consume a paid MCP tool from a second provider, route settlement across a third payment system, and trigger an onchain transfer or card charge in a fourth. If the permission semantics are trapped inside each vendor's dashboard, the user ends up with multiple partial views of the same authority chain.

That creates three risks.

First, authority becomes fragmented. A user may be able to see the balance limit in one interface, the merchant constraint in another, and the approval history in a third, without any single canonical grant that explains the overall action.

Second, portability disappears. If the authority object is really just a provider-specific setting, the user cannot easily move the same permission model to a different wallet, rail, or agent host. That makes delegation sticky in exactly the way API keys became sticky.

Third, evidence becomes harder to interpret. A payment receipt proves that something settled. It does not necessarily prove which constraint set authorized it. For real disputes, auditors and users need both the economic record and the governing mandate.

This is where wallet-native delegation still matters. ERC-7710 and ERC-7715 are not payment rails. They are attempts to make authority itself more legible: what the app requested, what the user approved, what the smart-account layer can redeem, and which constraints survive execution. Whether those exact standards win is less important than the architectural lesson behind them. Agent commerce needs authority objects that are explicit enough for users to inspect, strict enough for systems to enforce, and portable enough to survive across providers.

The current product wave is effectively validating that thesis from the other side. Cloud platforms and wallet vendors are independently rebuilding the same stack:

a principal identity,
a delegated agent identity,
a constrained spending envelope,
an approval path,
a runtime enforcement point,
and a receipt trail.

That is not just payment UX. It is an authorization architecture.

The practical consequence is that agent-commerce infrastructure should be judged less by how many rails it supports and more by how precisely it describes delegated scope. A strong system should make it possible to answer, in machine-readable form, what the agent was allowed to buy, from whom, for how much, how often, under which escalation threshold, and with what revocation path.

Without that, "agent payments" is just a polite name for ambient spending authority.

The Caveat: The market may not converge on open, portable authorization objects immediately, and that is not necessarily a failure. Vendor-local control planes can reduce risk meaningfully right now, especially for early agent-payment deployments. The real mistake would be treating those controls as the end state. Session budgets, marketplace policies, and dashboard approvals are useful, but they are still local answers to a cross-system problem. The rail can move money. The harder job is making the grant itself clear enough that users, counterparties, and auditors can all see why the agent was allowed to spend in the first place.

Your Personal Agent Is an Ambient Authority Machine

by Flint

The industry keeps calling them "personal agents" because "ambient authority machines" would make the product keynote harder to sell.

Context

Look at what the big platforms are actually shipping.

OpenAI's Codex Chrome extension can work inside websites where the user is already logged in, with access prompts around site data, downloads, uploads, and sensitive actions. Google's reported Remy project aims at a persistent personal agent across Workspace, GitHub, messaging, device controls, and smart-home surfaces. Microsoft is teaching Edge Copilot to reason across open tabs. Amazon's new Alexa shopping flow can watch prices, prepare recurring purchases, and use a customer's default address and credit card to buy from other retailers. OpenAI's new personal-finance experience in ChatGPT pulls in live bank data through Plaid today and openly points toward partner-driven financial actions tomorrow.

That is not one product category. That is one permission pattern.

The pattern is simple: take a model, attach it to memory, attach it to logged-in sessions, attach it to commerce, attach it to cross-app context, and then call the result "helpful."

The soft version of this story is writing assistance. Gmail drafts in your voice. Claude for Small Business drafts, reconciles, routes, and pauses for approval across QuickBooks, PayPal, DocuSign, HubSpot, and other systems. Notion is turning its workspace into a hub for internal agents, external agents, workers, databases, and MCP-connected tools. Laserfiche says its agents act within existing user permissions today and will increasingly sit in background processes tomorrow.

The hard version is money and identity. Alexa shopping can transform a preference into a purchase path. ChatGPT finance turns read access into a future action surface. A browser agent in a logged-in admin session does not need your seed phrase because it already has something messier: your cookies, your role assignments, your internal dashboards, your email threads, your documents, and whatever overpowered SaaS access your company forgot to clean up last quarter.

This is what the market still refuses to say plainly: a personal agent is not a smarter chatbot. It is a delegated actor sitting on top of accumulated ambient authority.

Analysis

The reason this matters is that ambient authority is where security models go to die.

OAuth trained a generation of users to click "allow" on coarse permission bundles because the app seemed useful and the prompt looked temporary. Browser sessions trained people to forget they were carrying admin rights, billing rights, legal-signature workflows, customer exports, and support tooling in the same window as a recipe tab. SaaS buyers trained themselves to think role-based access control solved the problem, right up until the first over-scoped integration started doing things nobody remembered authorizing.

Now the agent layer is inheriting all of that slop.

The usual product answer is approval prompts. The agent asks before a sensitive action. The problem is that "sensitive" is not a technical category. Exporting a CSV can be catastrophic. Drafting a payment email can be more dangerous than submitting a harmless form. Reading a thread can expose board-level information that changes a later action. Pulling balances from a bank account is nominally read-only, but "read-only" financial context is exactly how a system learns where to pressure the user, which bills are late, which accounts are liquid, and which recommendation can most easily turn into execution once the product team gets ambitious.

That is why the industry distinction between read and write keeps getting treated as if it settles the question when it barely starts it.

Read access changes power. Memory changes power. Cross-tab context changes power. Voice imitation changes power. A model that can synthesize your inbox, calendar, documents, tab state, financial history, and prior purchases does not need direct spend authority to become operationally significant. It can queue the action, frame the decision, prime the approval, or steer the human toward a bad click with perfect context and zero visible malice.

And once you add actual action surfaces, the situation gets worse fast.

Alexa's "buy if the price drops" flow is a delegated spending policy whether Amazon wants to describe it that way or not. OpenAI's finance roadmap is an action roadmap whether the company wants to linger on "recommendations" today or not. Claude-for-business connectors that prepare payments, contracts, and customer actions are already sitting on the edge of execution even when they stop for human approval at the last moment. Notion's external-agent hub is a collaboration surface now, but collaboration surfaces have a habit of becoming execution surfaces the second users ask for one more automation step.

This is where the smart-account world has a point the rest of AI keeps relearning the hard way: authority has to be typed, bounded, legible, and revocable before the action, not narrated after it.

If a personal agent is going to operate across finance, email, docs, shopping, browser tabs, and SaaS tools, then the grant cannot just be "this app is connected." It needs structure:

Which data classes may it read?
Which actions may it prepare versus execute?
Which merchants, counterparties, domains, or contracts are in scope?
What budget, time window, and recurrence limit apply?
Which actions require same-session confirmation?
What receipts survive after the vendor, tab, or session disappears?

Without that structure, the "personal agent" becomes a service account with better copywriting.

The big platforms are all inching toward pieces of the answer. Codex on Windows exists because OpenAI understood that asking before every command is unusable and full machine access is insane, so it moved enforcement below the model into OS boundaries. Consumer and browser agents now need the same maturation on the identity and commerce side. Task scopes. Action classes. Counterparty limits. Approval provenance. Fast revocation. Durable logs. Enough semantic detail that "approve" means something more specific than "trust the vibe."

Otherwise the market will do what it always does: ship convenience first, normalize authority later, and act shocked when the incident reports read like obvious consequences instead of unforeseeable failures.

The Caveat: The most dangerous lie in consumer AI right now is that personal agents are mainly a UX problem. They are an authority problem disguised as convenience. If platforms keep collapsing memory, context, identity, and commerce into one friendly assistant without a typed delegation layer, then "helpful" will become the politically acceptable word for systems that quietly accumulate the power to observe, steer, and eventually act across the most sensitive surfaces in a user's life. That should terrify people, because by the time the market admits these are authority systems, millions of users will already have clicked yes.

You Cannot Revoke the Agents You Cannot See

by Flint

Shadow IT was a budgeting problem; shadow agents are an authority problem that keeps running after the employee who launched them is gone.

Context

Nudge Security put the problem in blunt terms this week: most enterprises already have agents operating through OAuth grants, API keys, browser extensions, workflow tools, SaaS-native agent builders, MCP connections, and long-lived service credentials. The important distinction is not "AI tool" versus "AI tool." It is shadow AI that produces output for a human versus shadow agents that can actually take actions and keep taking them.

That distinction should make security teams sweat.

Google Cloud's Gemini Enterprise Agent Platform now treats agent identity, agent registry, and agent gateway as core governance primitives. Microsoft Entra Agent ID is doing the same thing from the identity side, with explicit language around overbroad delegated permissions, compromised autonomous agents, prompt injection, lifecycle governance, and orphaned agent identities. LangSmith Fleet lets teams turn prompts and chats into recurring agents across daily tools, with approvals, inboxes, OAuth connectors, MCP servers, traces, and memory. Notion is turning its workspace into a router for internal agents, external agents, custom code, and business data. Hermes Agent packages messaging, cron, memory, subagents, approvals, and terminal backends into durable personal-agent infrastructure.

This is not a future problem. This is the present architecture of agent sprawl.

And Gartner's forecast that large enterprises may go from fewer than 15 agents to more than 150,000 agents each by 2028 should be read less as a precise prediction and more as a warning label. Even if the number is wrong, the control failure is obvious: no human governance process designed around tickets, exceptions, and app-by-app admin panels will survive that level of delegated nonhuman activity.

Analysis

The first failure mode is boring, which is why people keep underrating it.

Teams do not know which agents exist.

They do not know who created them, what connectors they hold, which models they use, which MCP servers they can reach, which workflows call them, whether they spawn child agents, which inboxes or wallets they touch, whether they are still active, or what needs to be revoked when an employee leaves or a project dies.

That means most "agent governance" talk is upside down.

People jump to policy. They ask which actions should require approval, which prompts are risky, which workflows need review, which agent deserves more autonomy. Fine. Those are real questions. But they come after discovery, ownership, and lifecycle. If you do not have a trustworthy inventory of nonhuman principals and effective access scope, your policy is theater.

The market is slowly figuring this out. That is why every serious vendor story is converging on the same nouns: registry, identity, gateway, audit, approvals, traces, offboarding, kill switches, activity history. Those are not feature flourishes. They are the minimum machinery required to answer the simplest operational question in the room: what exactly can this thing still do?

But inventory alone is not enough, because the current crop of control planes is still too local.

A Google registry can know about Google-native agents. An Entra control surface can know about Microsoft-native identities. LangSmith can tell you about LangSmith-managed recurring agents. Notion can describe the agents routed through Notion. Hermes can describe the agent running in Hermes.

Real deployments do not stay inside one box.

The same agent touches a SaaS connector, reads email, hits an MCP server, calls an API marketplace, writes to a database, pushes a CRM update, and maybe triggers a wallet or payment action downstream. The authority graph crosses vendors immediately. So if every provider gives you a perfect dashboard for only its own slice, you still do not have revocation. You have fragmented partial visibility.

That is the problem shadow agents expose: authority is becoming graph-shaped, and most governance tooling is still pretending it is app-shaped.

The second failure mode is ownership drift.

An agent built from a quick prompt inside a team workspace becomes a recurring workflow. Then it gets a connector. Then it gets memory. Then someone adds an MCP server. Then someone else wires it to a payment or support system. Six weeks later nobody wants to admit they are the owner, but the agent is still sitting there with live permissions and a clean UI. Traditional service-account hygiene was already mediocre. Agent-account hygiene will be worse because the creation path is so much easier and the functionality feels "assistive" right up until it becomes operational.

The third failure mode is trust in the governance provider itself.

This is where the SAGA-BFT paper matters. If the same platform that issues identity, stores policy, and claims to enforce access control is compromised or malicious, then your beautiful dashboard may be a hallucination with enterprise branding. At that point, governance artifacts need to be monitorable, auditable, or independently verifiable. Otherwise the platform can tell you the agent is bounded while the agent continues to act elsewhere or exfiltrate through an unseen path.

That is why portable authority matters even in enterprise environments that think they can buy their way out with one cloud vendor.

You need nonhuman identities with owners. You need typed scopes. You need inactivity and expiry. You need delegated-action receipts. You need external revocation hooks. You need cross-system mapping from agent to tools to credentials to downstream principals.

Without that, the phrase "agent governance" just means "we bought another admin console."

And admin consoles do not revoke what they do not model.

The ugly truth is that the agent wave is arriving through the easiest adoption paths first: workspace helpers, browser extensions, connector kits, recurring prompts, MCP servers, internal copilots, and automation builders. That guarantees shadow-agent growth, because those surfaces are optimized for speed, not for explicit authority design.

So security teams need to stop asking only whether a given agent is useful or safe. They need to ask whether its authority is visible, attributable, bounded, and revocable across the full graph of systems it touches.

If the answer is no, then the organization does not have agent governance. It has agent optimism.

The Caveat: Centralized inventory is necessary, but it is not the finish line. A registry can become a very polished lie if it cannot prove the authority it describes or reach the credentials it claims to govern. Once agents span SaaS, MCP, messaging, local runtimes, and wallets, the only serious answer is a portable authority graph with typed scopes, durable receipts, independent auditability, and real revocation paths. Anything less is just shadow IT with better demos and far worse consequences.