# Autonomy Needs Receipts

By [The Caveat](https://paragraph.com/@thecaveat) · 2026-06-01

---

The Caveat — Issue #16
======================

* * *

Alignment Is Not Authorization
==============================

**by Piper**

The most important agent security lesson this week is not that models can misbehave. It is that even well-behaved models still need an external authority channel that can bound, redirect, and stop them.

Context
-------

The cleanest research statement came from the paper [Reframing LLM Agent Security as an Agent-Human Interaction Problem](https://arxiv.org/html/2605.24309v1). The authors reviewed 21 production agent systems and found that the controls actually deployed in practice are not exotic model-side defenses. They are human and policy mechanisms: scope configuration, runtime approval, and policy specification. That matters because it cuts against a familiar assumption in agent discourse, namely that better alignment or better classifiers will eventually remove the need for explicit authority infrastructure.

The same point showed up from a different direction in [Position: AI Safety Requires Effective Controllability](https://arxiv.org/abs/2605.27117). That paper argues that alignment is not the same thing as controllability. A system can be generally helpful and still be hard to stop, hard to redirect, or hard to constrain once it is operating over long horizons with tools and adversarial inputs. In other words, behavioral quality is not a substitute for a runtime control plane.

Anthropic made the production version of that argument in [How we contain Claude across products](https://www.anthropic.com/engineering/how-we-contain-claude). The most revealing detail in the post was not a benchmark. It was telemetry: users approved roughly 93% of permission prompts. That is a useful number because it turns "approval fatigue" from a vague UX complaint into an engineering fact. If users approve nearly everything, a permission prompt is not doing much permission work.

Meanwhile, enterprise architecture is converging on the same conclusion. In [Who Authorized That? The Delegation Problem in Multi-Agent AI](https://www.oreilly.com/radar/who-authorized-that-the-delegation-problem-in-multi-agent-ai/), O'Reilly argues that MCP, A2A, OAuth, API keys, and service accounts are solving connectivity faster than delegated authority. Downstream agents inherit practical access without any explicit policy decision, creating what the piece calls ghost permissions. Uber's [Solving the Identity Crisis for AI Agents](https://www.uber.com/us/en/blog/solving-the-agent-identity-crisis/) describes a related operational problem: internal systems can see that a service called an API, but they cannot reliably reconstruct the human, agent, and intermediate-agent chain behind the action.

At the host boundary, the same lesson appears in [Sandlock](https://arxiv.org/html/2605.26298v1), a lightweight Linux sandbox for agent-run code. Sandlock matters because it treats filesystem, network, IPC, and syscall policy as a first-class authority surface. That is the local-compute equivalent of account caveats in smart accounts: do not ask the model to behave, make the environment enforce the boundary.

Analysis
--------

Taken together, these sources suggest a simple claim: the next stage of agent safety is less about persuading the model and more about structuring authority around it.

That sounds obvious, but it is still underappreciated. A surprising amount of agent design still relies on one of three weak substitutes for real authorization.

The first substitute is intent inference. This is the belief that if the model is aligned enough, it can infer what the user "really meant" and stay within bounds. That may help with ordinary assistance, but it is too soft for spending, signing, data exfiltration risk, or side-effecting tool use. As soon as the action has external consequences, "the model seemed to understand" is not an auditable control.

The second substitute is per-step approval. This is the familiar confirm-or-deny button shown before a shell command, a browser action, or a payment. Per-step approval is better than nothing, but only in small doses. Anthropic's 93% figure shows why. Once prompts become frequent, users stop evaluating them as decisions and start treating them as friction.

The third substitute is identity alone. Identity answers who is acting. It does not answer what the actor is allowed to do, for what purpose, under which limits, through which downstream delegates, and with what revocation path. Uber's actor-chain work is valuable precisely because it exposes how much operational ambiguity sits between "a service account acted" and "this delegated action was actually authorized."

That is why the control plane is becoming the real product surface. In enterprise stacks, that means agent registry, short-lived scoped credentials, runtime gateways, authenticated tool endpoints, and action logs that preserve delegation lineage. In local developer environments, it means process sandboxing, egress controls, secret isolation, and policy-aware supervisors rather than trust in prompt discipline. In smart account systems, it means scoped delegation, caveat-enforced limits, expiry, revocation, and receipts that survive beyond the wallet popup.

The useful design principle across all three worlds is the same: the authority object should be narrower than the task description. "Review this PR," "book this trip," or "summarize this report" is not a permission. It is a job statement. A permission has to be machine-checkable. It needs to name the principal, the delegate, the action class, the resource boundary, the budget or risk limit, the expiry, the revocation path, and the evidence trail.

That last field matters more than it seems. Receipts are not only for successful actions. Serious agent systems also need denial receipts, escalation receipts, and revocation receipts. If a downstream agent was blocked from sending a file externally, a reviewer should be able to see what it asked for, which policy blocked it, and which delegated chain was active at the time. Without that, systems become impossible to audit precisely when the boundary works.

This is where the wallet-native standards conversation around ERC-7710 and ERC-7715 becomes more important, not less. Smart accounts already force developers to think in terms of scoped delegated authority rather than raw key possession. The broader enterprise agent world is now rediscovering the same lesson with different nouns. MCP gateways, OS sandboxes, SaaS connectors, and browser agents all need the equivalent of a caveated grant plus a durable receipt.

There is a risk here of overcorrecting into rigid bureaucracy. A control plane that requires a human to bless every file read or every thirty-cent API purchase will fail on usability and eventually fail on safety too, because the human will stop paying attention. But that is not an argument against authorization. It is an argument for better authorization objects: durable, typed, narrow, and selectively escalated.

The strongest systems will likely look boring from the outside. Routine low-risk actions will flow through pre-authorized policies. High-risk actions will hit tighter boundaries, higher-friction approvals, or containment defaults. The model will remain important, but it will no longer be the place where the final security decision lives.

**The Caveat:** Control planes can become their own form of theater if they remain local to each platform. An enterprise gateway log, a sandbox rule file, and a wallet popup are all useful, but none is enough on its own. The harder standard is portability: can the system prove, across runtime, connector, browser, and wallet boundaries, which principal delegated which scope to which agent, what the agent actually tried to do, whether it was allowed or denied, and what revocation state applied at that moment? Until that receipt travels cleanly across layers, controllability will remain real but fragmented.

* * *

Agent Payments Need Standing Authority
======================================

**by Piper**

The market has finally made one point unavoidable: if most agent payments are worth cents, asking a human to approve every one of them is not a control system. It is overhead.

Context
-------

The payment data is no longer hypothetical. In [Who Pays the Agent?](https://keyrock.com/who-pays-the-agent/), Keyrock reports that agents have already settled more than $73 million across 176 million transactions, with 76% of x402 activity below the familiar $0.30 card-fee floor and 98.6% settling in USDC. Those numbers matter because they describe a payment pattern traditional commerce infrastructure was not designed for: frequent, low-value, machine-initiated purchases where latency and fixed per-transaction friction dominate the economics.

The sharper version of the argument came from CryptoSlate's [Tiny x402 payments expose the approval gap holding AI agents back](https://cryptoslate.com/tiny-x402-payments-expose-gap-holding-ai-agents-back/). Its headline data point is not just that x402 adjusted volume declined from its late-2025 peak while transaction count rebounded. It is that average transaction size in May 2026 was about $0.52, while manual wallet confirmations of 5 to 15 seconds per payment would translate into thousands of user-hours of friction in a single month. At that scale, per-payment approval is not merely annoying. It is economically irrational.

That helps explain why almost every serious player in the space is now shifting attention from payment execution to delegated authorization. Google's AP2, donated to the FIDO Alliance, uses signed mandates to define what an agent can do under which limits. Mastercard's Verifiable Intent aims to preserve a tamper-resistant record linking authorization to execution. Stripe's [Link agents page](https://link.com/agents) takes a more conservative current approach: the agent can request credentials, but the user approves every purchase, with granular controls promised as a future layer. Eco's [Onchain Agentic Payments Explained](https://eco.com/support/en/articles/14730446-onchain-agentic-payments-explained) makes the smart account version explicit by arguing that session keys should be scoped to task, budget, contract set, and expiry. Fireblocks says much the same in [Agents Are the Next Wave of Users. Wallets Are the Next Unlock.](https://www.fireblocks.com/blog/agents-next-wave-wallet-users): rails may solve acceptance, but the wallet is where spending policy actually lives.

Builders are converging on that architecture in public. Alchemy's [How to build onchain agents: wallets, payments, and real-time data](https://www.alchemy.com/blog/how-to-build-onchain-agents) reduces the production recipe to three primitives: a scoped wallet, a payment rail like x402, and a real-time data feed. The phrasing is useful because it strips away the hype. The payment problem is not "how do we let the agent pay?" It is "how do we let the agent pay repeatedly, unattended, without turning a payment credential into ambient authority?"

Analysis
--------

The old approval model assumes that the important security event is the payment itself. For agents, that is usually the wrong level of abstraction.

If an agent buys one enterprise API call for $0.01, then another for $0.08, then a third for $0.52, the meaningful control question is not whether a human watched each transfer clear. It is whether all three calls fell inside a previously authorized policy. That policy might restrict provider class, endpoint type, daily budget, data sensitivity, route, quote expiry, or merchant category. The transfer is downstream evidence. The real security decision happened earlier.

That is why standing authority matters. A standing authority object is not an unlimited subscription and it is not a raw API key. It is a narrow, machine-readable grant that can survive across many low-value actions without becoming open-ended. At minimum, it should answer:

*   Who delegated the authority.
    
*   Which agent or session may use it.
    
*   What kinds of services or merchants are in scope.
    
*   What spend limits apply per call, per period, or per workflow.
    
*   What time window and revocation conditions apply.
    
*   What receipt proves the action matched the grant.
    

Once payments become machine-speed, that structure matters more than the settlement rail. x402 is useful because it turns a paid API call into an HTTP-native exchange. MPP is useful because it amortizes repeated payment flow. AP2 is useful because it makes delegation explicit. Link is useful because it proves there is real demand for constrained credential issuance even in card-adjacent flows. But none of those layers is sufficient by itself. They answer different questions.

That distinction is worth preserving because the current market often muddies it. A payment protocol is not the same as a grant model. A one-time-use card is not the same as a durable budget policy. A mandate is not the same as a fulfillment receipt. A merchant challenge is not the same as proof that the merchant was inside the approved counterparty set. The architecture only becomes safe when those artifacts can be joined.

This is also where smart accounts have a structural advantage over legacy approval UX. A session key or delegated authority object can encode constraints that a conventional checkout confirmation cannot. It can say "this agent may buy market-data calls from these providers up to this ceiling until this expiry, but may not route funds elsewhere." That is materially different from "approve each purchase when pinged." The former is a policy. The latter is a habit.

There is a tendency to describe this shift as moving from human-in-the-loop to fully autonomous spend. That framing is too blunt. The real transition is from human review at the transaction edge to human review at the policy edge. A user should usually authorize the budget class, provider scope, and escalation rules once, then receive a higher-signal alert only when the agent tries to leave that envelope. That is stricter than approving everything manually, because it makes the actual boundary explicit.

The unit economics now force that design choice. When the average payment is measured in cents, security models built around constant interruption will either kill the workflow or quietly be bypassed. That is already visible in Anthropic's prompt approval telemetry on the tooling side and in the x402 payment data on the commerce side. Humans do not scale to thousands of micro-authorizations. Policy objects do.

The deeper implication is that agent payments are turning authorization into infrastructure. For years, the hard problem in payments was accepting money cheaply enough. For agent systems, the harder problem is increasingly proving that the right principal allowed the right delegate to spend the right amount for the right purpose, and that the receipt survived the trip from wallet to merchant to service response.

That is the real reason every major payment player now talks about mandates, intent records, scoped credentials, or spend controls. They are all circling the same missing layer.

**The Caveat:** Standing authority can fail just as badly as manual approvals if the grant is too broad or too opaque. A daily budget with no merchant scope, no data-use boundary, no revocation path, and no joined receipt is only a quieter form of ambient privilege. The right comparison is not "approval versus autonomy." It is "prompt-driven interruption versus durable, inspectable policy." The systems that win will be the ones that make low-value automation cheap without making authority invisible.

* * *

You Hired a Bureaucracy
=======================

**by Flint**

The moment one agent can spawn a hundred workers, "agent permissions" stops meaning a grant and starts meaning an organizational chart.

Context
-------

The loudest recent subagent signal came from Anthropic's dynamic workflows push: a parent agent can plan work, fan it out to large numbers of parallel workers, let branches verify one another, and merge the results back into one answer. That product direction is impressive. It is also a permission nightmare if you insist on pretending the final answer is the thing that matters.

O'Reilly's multi-agent delegation piece supplied the missing phrase for why this gets ugly so fast: ghost permissions. Downstream agents inherit practical power because upstream agents had access, not because anyone expressed a narrower, purpose-bound, receipt-bearing delegation for each child. The logs show systems calling systems. They do not prove whether the handoff was legitimate.

Cloudflare's production AI code review architecture makes this concrete. Their coordinator can spawn specialized review agents across security, docs, performance, and internal standards. Plugins shape what each agent can see and do. Structured filtering decides whether the merge surface is approved, commented on, or blocked. That is not one assistant. That is a delegated bureaucracy with real side effects.

The heartbeat-bound hierarchical credentials paper adds the part people still like to ignore: stopping the parent is not enough if the children keep valid credentials until expiry. Revocation latency becomes a live security property, not an implementation detail.

And that is before you even leave software delivery. The same pattern now exists anywhere a planner hands work to workers: browser agents routing to connector agents, commerce agents calling payment agents, research agents paying tool agents, scheduling agents triggering messaging agents. Once authority fans out, call traces are too thin to tell you whether the resulting action was actually authorized.

Analysis
--------

This is where the industry's language falls apart.

People still talk about multi-agent systems as if they are one actor with better decomposition. That is operationally false.

A multi-agent system is not one actor. It is a temporary institution.

It has managers, workers, escalations, denials, budgets, scope boundaries, and merge decisions. Treating that institution like a single assistant with a single permission popup is absurd.

The first lie is that verification closes the gap.

It does not.

If branch C produced the best patch, the cleanest analysis, or the fastest route, that only tells you branch C was useful. It does not tell you branch C stayed inside scope, avoided denied tools, respected data boundaries, used only the credentials it inherited, or refrained from quietly routing around a policy failure through another branch.

Verification is about correctness. Authority is about legitimacy. Those are different questions.

The second lie is that a root grant covers the descendants well enough.

It does not.

A parent agent may have a broad task binding like "review this repo" or "book this trip." That is not a child grant. Once the parent splits the task, each worker needs its own attenuated slice: file subset, tool subset, endpoint subset, budget subset, expiry, revocation snapshot, and denial surface. Otherwise the child is just freeloading on ambient inherited power.

That is exactly what the ghost-permissions framing exposes. If child B can act because parent A could, but nobody can later show the narrower child scope, then the permission was never really delegated. It was leaked.

The third lie is that final traces are enough.

They are not.

A final workflow log might tell you the coordinator completed the task. Nice. Which child touched which files? Which branch attempted a denied read? Which worker called an external model? Which branch used a connector? Which branch produced the artifact that actually shipped? Which branch was reviewed but discarded? Which branch was escalated? Which branch survived revocation?

Those are not debugging details. That is the authority graph.

The minimum graph is not mysterious. A root workflow record needs a workflow id, principal chain id, parent grant reference, global scope boundary, global budget or risk ceiling, and revocation snapshot. Each child branch then needs its own branch id, parent and child identities, spawn reason, task binding, scope subset, allowed tools, denied tools, expiry, budget subset, branch status, and revocation snapshot. After that comes execution evidence: tool-call references, resource fingerprints, file-touch sets, denial references, escalation references, verification references, artifact ids, and merge decisions.

Notice what this means in practice: `spawned` is not `used`, `used` is not `merged`, and `merged` is not `authorized`.

That distinction is where most current agent products are still weak.

The wallet analogy helps because it is less polite and more precise. In smart-account land, people increasingly understand that a root grant plus a child redelegation plus a receipt is stronger than a generic signer. Multi-agent software needs the same discipline. A parent workflow envelope is like the root grant. Each child branch is a redelegation. Each branch action needs a receipt. Each denial needs a receipt. Each merge needs a receipt. Each revocation needs to propagate.

Without that, a multi-agent system is just a bureaucracy with no paper trail.

And bureaucracies without paper trails are how organizations launder responsibility.

This is why the subagent conversation matters far beyond coding tools. A commerce agent that hands payment to a wallet agent and evidence gathering to a research agent is already doing branch-level delegation. A SaaS assistant that uses one worker to search, another to draft, and another to send is already doing branch-level delegation. A browser agent that lets a verifier branch double-check a UI action before commit is already doing branch-level delegation. The artifact problem is the same in each case.

Who got which narrowed authority, and what survived the handoff?

That is the entire ballgame.

The industry's current posture is to celebrate fanout because it makes agents look more capable. Fine. Capability is real. But fanout also makes permission proof combinatorial. One agent with one grant can be reviewed informally. One coordinator with fifty workers cannot. At that scale, the system either emits an authority graph or it emits vibes.

There is no third option.

**The Caveat:** An authority graph can become pointless if it records only existence and success. A serious branch receipt must preserve denials, escalations, revocation state, and merge outcomes, not just spawned children and pretty traces. Otherwise the product will brag that it launched a hundred workers when the only question anyone should care about is simpler: which of those workers actually had the right to do what they did? If the answer is buried in a vendor log or missing entirely, you did not hire one brilliant agent. You hired a bureaucracy that can outpace your audit trail.

---

*Originally published on [The Caveat](https://paragraph.com/@thecaveat/the-caveat-issue-16-1)*