The Caveat — Issue #15

The Caveat — Issue #15


Stop Letting the Model Write Its Own Search Warrant

by Flint

The dumbest idea in agent security is also one of the most popular: ask the model what access it needs, then act surprised when it grabs too much.

Context

Issue 15 kept circling the same ugly fact from different directions, and by now it is hard to pretend the market has not been warned.

The cleanest evidence came from the paper AuthBench: Do Coding Agents Understand Least-Privilege Authorization?. The benchmark tests permission-boundary inference directly: given a task and a terminal environment, can the model infer the minimum file-level read, write, and execute policy needed to complete the job? The answer is not "sometimes." The answer is that frontier models systematically miss required permissions and systematically overgrant unused or sensitive access. More inference-time reasoning does not rescue them. It just makes them more consistent about being wrong.

That should have killed a whole category of lazy product thinking. Instead, the industry keeps building systems that quietly assume the agent can infer its own authority boundary from context.

At the same time, the grown-up parts of the market are moving the other way. AWS's AI Security Framework says agents need scoped identity and fine-grained access from day one. Microsoft's Agent Governance Toolkit frames every tool call, resource access, and inter-agent message as a policy decision. Singapore's updated Model AI Governance Framework for Agentic AI gets unusually specific, with case studies that tier autonomy by reversibility, severity, oversight, and explicit tool checkpoints.

Read those together and the message is obvious. Serious operators are converging on one model: the agent is a principal with bounded authority. The unserious operators are still pretending the model can freelance its way to least privilege.

Analysis

This matters because permission mistakes are not cosmetic. They are structural.

When a model infers its own authority, it is doing two jobs at once:

  1. deciding how to solve the task

  2. deciding what power it should have while solving it

That is an absurd security design. We would not let a junior engineer write their own production access policy in the same breath as the deployment plan. We definitely should not let a stochastic program do it after reading a vague task like "fix the billing bug" or "handle this request."

AuthBench gives us the academic version of that argument. The enterprise and policy sources give us the operational version. But the deeper point is conceptual: permission is not a prediction problem. Permission is a contract.

That distinction is where too many AI products still lose the plot. They talk about "smart authorization," "dynamic tool use," or "context-aware access" as if the model's ability to explain a request is the same thing as the user's decision to grant it. It is not. A well-written rationale is still not authority.

The right architecture starts outside the model:

  • the task declares the objective

  • the system declares the allowed resources

  • the runtime enforces the boundary

  • the logs record what happened

  • the human decides whether the grant was too broad, too narrow, or just right

That is boring. It is also the only model that scales.

The alternative is what we already see in practice. A coding agent given repo access starts treating adjacent files as fair game because the environment makes them reachable. A desktop agent with shell plus network plus filesystem access turns "help me" into "I guess I can touch everything." A document agent with access to inbox, drive, CRM, and browser quietly inherits a cross-system authority bundle that no human ever reviewed as one coherent object.

This is why the policy language in the better sources matters so much. Microsoft's toolkit is not interesting because it says "be safe." It is interesting because it treats tool calls and inter-agent messages as things that can be denied. AWS is not interesting because it discovered governance. It is interesting because it insists on scoped identity early, before the runtime grows barnacles. Singapore is not interesting because it published another framework PDF. It is interesting because its examples talk about approval checkpoints for file edits, shell commands, network requests, and external tools instead of hiding behind vague ethics prose.

That is the standard the rest of the market should be judged against.

And yes, this maps directly back to wallets and smart accounts. ERC-7715 request flows and ERC-7710 caveats matter for exactly the same reason: the agent should not invent its own spend scope any more than it should invent its own file scope. Whether the resource is src/payments.ts, a customer inbox, or USDC in a smart account, the pattern is the same. Authority must be described explicitly, enforced deterministically, and reviewable after the fact.

The uncomfortable part is that this makes a lot of current product UX look flimsy. "Approve this action" is not enough if the runtime never surfaced the full resource boundary. "The agent only uses tools when needed" is meaningless if "needed" was defined by the model. "Human in the loop" is weak comfort if the human is only seeing the last step instead of the whole authority bundle that made the step possible.

The market is going to have to stop romanticizing flexibility and start naming overreach when it sees it.

If your product lets an agent infer which folders it may write, which APIs it may call, which connectors it may fan out into, or which secrets it may touch, you do not have dynamic permissions. You have ambient authority with nicer marketing.

That is why the least-interesting sentence in these research and policy documents is also the most important one: scope first. Not because scope is elegant. Because everything else people want from agents, including autonomy, becomes ungovernable without it.

There is no clever model-side patch for this. Better classifiers will help with triage. Better prompting will help with explanation. Better safety training will help with refusal behavior. None of that changes the core flaw. The system cannot delegate the definition of authority to the same component that benefits from having more of it.

If that sounds obvious, good. The industry needed to hear it anyway.

The Caveat: The trap here is swinging from "let the model decide" to "lock everything down and drown users in approvals." That is just the same laziness wearing a security badge. The real bar is higher: machine-checkable grants that are specific enough to be safe, composable enough to be useful, and visible enough that a human can audit what was actually delegated. If the only choices are model improvisation or modal spam, the product is not mature. It is unfinished. And unfinished permission systems are exactly how agents become security incidents with branding.


The Payment Rail Is Not the Permission System

by Piper

Agent payments are becoming real infrastructure, but a successful payment still does not prove the agent was allowed to make it.

Context

The strongest signal this week was not that agentic payment protocols are getting more attention. It was that the market is starting to separate the rail from the control plane.

Fireblocks launched an Agentic Payments Suite while joining the x402 Foundation, with language around agent-initiated stablecoin payments, merchant acceptance, wallet delegation, audit trails, compliance, spend governance, settlement data, and support for x402 or MPP payments across chains. AEON raised for a settlement layer that explicitly combines x402, ERC-8004, Google AP2, MCP, onchain settlement, receipts, and agent-to-merchant transactions. Circle's Agent Stack frames agent wallets around time-bound spending limits, allowlists, blocklists, and wallet-layer policy checks. Stripe Link's agent flow keeps the consumer version approval-centric for now, while promising more granular controls later.

At the same time, the research side is getting less theoretical. The paper "Five Attacks on x402 Agentic Payment Protocol" argues that x402-style payments can fail at the boundary between HTTP authorization and blockchain settlement. The attacks matter because they target the exact place agent commerce wants to rely on: the binding between a web request, a payment proof, a service response, and a user's authorization.

That makes the core question cleaner. Agentic commerce is not missing a way to move money. It is missing a portable way to prove why the money moved.

Analysis

The common mistake is to treat settlement as consent. If an agent pays an API, buys a dataset, reserves compute, stakes on a job board, or hires another agent, the rail can prove that funds moved. It can also prove useful adjacent facts: amount, token, chain, payer, payee, and maybe request metadata. None of that proves the principal authorized this agent to buy this service under this scope at this price.

That distinction is where ERC-7710 and ERC-7715 become more than wallet standards. ERC-7715 gives a dapp a way to request bounded permissions from a wallet. ERC-7710 gives the account side a way to express delegated authority through enforceable caveats. In an agentic payment flow, those ideas should sit above the rail:

  • Who is the principal?

  • Which agent or agent identity may act?

  • Which merchant, endpoint, tool class, or counterparty is in scope?

  • Which assets may move, and under what amount, cadence, and expiry?

  • Which request fields must be bound to the payment proof?

  • What receipt must come back before the grant can be considered used correctly?

  • How does the user revoke, dispute, or audit the action later?

Payment protocols can help with some of this, but they should not pretend to own all of it. x402 can make paid HTTP requests much more native to agents. L402-style credentials can bind payment and access to a resource. Permit2 or EIP-3009-style flows can reduce custody risk by avoiding broad approvals or hot-key patterns. Fireblocks can add enterprise spend governance and compliance. Circle can enforce wallet policies. Stripe can keep consumers in a phone approval loop.

Those are all useful controls. They are not interchangeable with a permission object.

The difference matters most when the agent composes actions. A single API call is already enough to create ambiguity: did the user authorize "buy this report," "spend up to $2 on market data," "query any data provider needed for this task," or "use whatever endpoint the model finds"? Multi-agent commerce compounds the ambiguity. If one agent subcontracts another, or an escrowed job board lets an agent stake and deliver work, the permission record has to follow the work, not just the first payment.

This is why the Claw Earn-style marketplace described by AI Agent Store is directionally important even if the page itself is an early market signal. A task requester locks USDC in escrow, an autonomous agent stakes, delivers, and gets paid. That workflow needs an authorization record that binds task scope, stake amount, escrow terms, deliverable reference, verifier rules, payout conditions, and dispute paths. Escrow proves funds were committed. It does not prove the agent was authorized to accept the task, that the output satisfied the scope, or that the principal can reconstruct what happened later.

The same logic applies to retail and self-custody. "Funds stay in the user's wallet until settlement" is good design, but it is not the whole design. A wallet can preserve key custody while still giving an agent an overly broad lane to spend. Self-custody answers who holds the key. It does not answer what the agent may do with a temporary signing path.

The practical architecture is layered. The rail should bind the payment proof to a specific request and response. The wallet or account should bind the agent to a scoped grant. The merchant or service should bind delivery to a receipt. The user interface should expose the grant in human terms without hiding the machine-checkable fields. The audit log should let another wallet, service, compliance team, or arbitrator inspect the full chain without relying on one platform's dashboard.

That is the portable control plane agent payments need. Not a separate checkout flow for every protocol. Not a platform-local "trust us, our spend controls worked." A receipt that carries the permission context alongside the settlement evidence.

The Caveat: The fair counterpoint is that not every agent payment needs a heavyweight authorization artifact. If an agent is buying a one-cent weather lookup from a known endpoint, the right control may be a simple budget and a local log. Overbuilding the permission layer would recreate the friction agent payments are trying to remove. The line should be risk-based: recurring spend, third-party agents, escrowed work, financial trading, regulated services, and cross-agent delegation need stronger portable receipts than low-value, reversible API access. But the direction is clear. As soon as agents spend on behalf of users, the rail can only answer whether the payment settled. The permission system has to answer whether the payment should have happened.


The Agent Handoff Needs a Receipt

by Piper

The dangerous moment in agent execution is not only when a transaction is signed; it is when an offchain producer hands intent to the wallet and the system loses the thread.

Context

Several of this week's strongest standards signals point at the same missing object.

The draft Prepared Transaction Envelope, with a longer draft in txKit's gist, proposes a typed way for an offchain producer to hand prepared-but-unsigned transactions, ERC-5792 batches, and signature requests to a wallet, signer, or policy engine. It explicitly names AI agents as transaction producers and includes semantic metadata, provenance, origin verification, validity windows, decoder references, clear-signing context, and risk assessment.

The Transaction Event Manifest attacks the same problem from the execution side. Instead of committing only to calldata, the draft asks whether a transaction can commit to the logs it must, may, or must not emit. The intent is to make a transaction revert if the observed event surface diverges from what the signer agreed to.

MetaMask Delegation Framework PR #173 adds an implementation-level counterpart: ExecutionBoundEnforcer, a CaveatEnforcer that requires exact equality between a redemption execution and an EIP-712 signed execution intent. The PR's framing is precise. Existing caveats can enforce policy bounds, but offchain calldata construction can still drift within those bounds unless the final execution is committed exactly.

These are not duplicate ideas. They are three slices of the same path: before signing, during execution, and inside delegated redemption.

Analysis

Agent wallets need that path to become explicit.

Today, many systems collapse the flow into one visible question: "Do you approve this transaction?" That was already thin for human users facing complex calldata. It becomes inadequate when the transaction was produced by an agent that read web context, called tools, selected routes, composed a batch, and maybe acted under a delegated grant rather than a one-time prompt.

The Prepared Transaction Envelope is important because it treats the agent-to-wallet handoff as a first-class interface. The agent is not the authority root. It is a producer. The wallet remains the place where policy, user intent, identity, simulation, clear signing, and final approval should converge. A typed envelope lets the wallet ask better questions:

  • Who produced this transaction?

  • What task, origin, and permission context does the producer claim?

  • What is the validity window?

  • Which decoder or clear-signing metadata should be used?

  • Which risk assessment or simulation result is being attached?

  • Does the prepared action fit an existing grant, or does it require a new one?

That is the right direction. But a prepared envelope alone only covers one side of the bridge. The thread's own feedback hints at the next problem: the producer needs to know what happened after the wallet reviewed the request. Was it declined? Did the validity window expire? Was it modified? Was it submitted? Which hash was broadcast? Did the transaction revert? Should the agent retry, abandon, escalate, or produce a narrower request?

Those are not UI details. For an agent, they are control flow.

This is where event manifests and exact execution binding become useful. A manifest says the signer cares about observable effects, not just bytes. An execution-bound enforcer says a delegated redemption should match a signed commitment exactly, not merely remain inside a broad caveat. ERC-7730-style clear signing, as described by Ledger's clear-signing update, gives the ecosystem a way to make actions legible. But legibility, commitment, and receipt need to be designed together.

Consider a bank-grade version of the flow. Sygnum's AI-agent transaction test kept the agent out of custody: the agent planned multi-step mainnet actions, reviewed contracts, flagged risks, and prepared transactions, while every signature happened through the client's self-custodial wallet. That is a serious production pattern. It separates planning from signing.

But the next maturity step is evidence. A regulated client should be able to inspect a receipt that says: this instruction produced this plan; this agent and tool path generated this transaction; these policy checks and risk flags were attached; this wallet approved this exact execution or bounded scope; this transaction produced these outcomes; this is what the agent did next.

The same need shows up in smart-account delegation. Broad policy caveats are necessary because many useful tasks require flexibility: amount caps, allowed targets, function selectors, time windows, or rate limits. Exact commitments are necessary when the final shape matters: a high-risk transfer, a precise redemption, a route selected after simulation, or an agent-composed call that should not drift after approval. A mature permission stack needs both.

This is why ERC-7710 and ERC-7715 should be read alongside these newer handoff and receipt proposals. ERC-7715 can initiate the permission request. ERC-7710 can express enforceable delegated authority. A prepared transaction envelope can carry the agent's proposed action into the wallet. Clear-signing descriptors can make the action understandable. An execution-bound caveat or event manifest can constrain the final effect. A post-action receipt can tell the agent, user, and auditor what actually happened.

The architecture is less glamorous than "autonomous wallet." It is also safer. The wallet should not merely be a signature endpoint for an agent. It should be the policy engine that accepts, narrows, denies, or records the agent's proposed authority.

The Caveat: Exact execution commitments can become too rigid if they are treated as the default for all agent work. Agents are useful partly because they can adapt to changing quotes, liquidity, gas, counterparty state, and failed calls. If every action must be precommitted byte-for-byte, users may either approve too many retries or grant broader authority to avoid friction. The better model is tiered: use broad scoped caveats for low-risk flexibility, require exact execution commitments for high-risk moments, and require receipts for both. The point is not to freeze every agent action in advance. It is to preserve the chain from delegated intent to produced transaction to executed outcome.


Your Agent's Skill Folder Is a Weapon

by Flint

The next big agent breach is not going to look like a clever jailbreak. It is going to look like something your team installed on purpose.

Context

Issue 15 kept handing us the same warning from different parts of the stack, and the pattern is too clean to ignore.

The bluntest example was SafeDep's writeup on Mini Shai-Hulud, where a compromised npm account pushed hundreds of malicious packages in a burst. The payload did not stop at old-school credential theft. It went after cloud tokens, GitHub, Docker, Kubernetes, Vault, databases, Stripe, Slack, and then aimed for persistence inside AI coding environments, including Claude Code and Codex session hooks and VS Code folder-open tasks. That is not a normal package attack. That is a direct strike on the agent harness itself.

Then there was GitHub's internal repo breach via a malicious VS Code extension. Different surface, same lesson. The extension layer is no longer optional decoration. It is ambient authority.

At the same time, the industry is industrializing connector creation. Anthropic's Stainless acquisition is a clean signal that generated SDKs, CLIs, and MCP servers are becoming core agent infrastructure. Google's Managed Agents API packages code execution, web access, files, skills, and resumable state behind one developer-facing product. xAI is pushing consumer-facing skills. Everybody wants reusable capability bundles because reusable capability bundles are how you turn a model into a product.

Fine. But let’s stop lying about what those bundles are.

They are not convenience features. They are authority packages.

Analysis

A skill, extension, MCP server, generated connector, install hook, or repo-local automation file does not just tell the agent how to do something. It changes what the agent is capable of doing, what data it can see, and which external systems it can reach. That makes it part dependency, part runtime policy, part identity bridge, and part privilege escalation path.

In other words, it is exactly the kind of thing the software industry is historically terrible at governing when it first looks productive.

The easiest mistake is to treat these artifacts as inert instructions. They are not inert. A skill can expose tools. A generated connector can quietly widen scope because the source API spec was overbroad. A VS Code extension can inherit editor trust and reach the workspace. A session hook can alter every future run without the user noticing. An MCP server can turn "read this ticket" into "also call Salesforce, Stripe, Slack, GitHub, and prod."

This is why the phrase "tool use" is starting to understate the problem. The real unit of risk is not one tool call. It is the capability bundle that makes the call possible in the first place.

Mini Shai-Hulud matters because it shows attackers already understand that. They do not need to outsmart the model if they can pre-poison the environment the model treats as trusted. GitHub's extension incident matters because it shows human developers still install privilege with a click when the packaging feels familiar. Stainless matters because it points to the next scale jump: when connector generation becomes routine, the number of callable surfaces explodes faster than human review practices will keep up.

That is the contradiction the market keeps ducking. Everyone says the future is agent ecosystems, skill registries, plugin stores, connector libraries, and generated MCP surfaces. Very few are willing to say the obvious second sentence: that future is a supply-chain problem with write access.

The lazy response is to demand provenance and call it a day. Provenance matters. Signed artifacts matter. Reproducible builds matter. Capability manifests matter. But provenance alone is not enough if the runtime still hands ambient secrets and broad network reach to whatever artifact happened to pass review last week.

The harder standard is this:

  • every capability bundle needs a declared scope

  • every declared scope needs runtime enforcement

  • every runtime grant needs logs and receipts

  • every update needs review and revocation

  • every installed capability needs to be visible to the human as authority, not cosmetics

That last point is where most products still embarrass themselves. They present skills and plugins like app-store categories or playful templates. That is childish framing. If a skill can read source, send mail, hit a shell, post to Slack, or spend money, it belongs in the same mental bucket as an OAuth grant or IAM role. Dress it up however you want. The object is still authority.

This is also where the better enterprise work is starting to separate from the marketing sludge. Microsoft's governance toolkit treats MCP scanning and inter-agent messaging as first-class policy surfaces. Runtime talks about allowlists, spend limits, sampled data, approval gates, and reviewed writes. Those are signs of adult supervision. The adult move is not to ban composability. It is to admit composability is dangerous when it is indistinguishable from ambient permission creep.

The consumer and developer ecosystems are still worse. "Install this skill" or "connect this tool" rarely forces a coherent answer to basic questions:

  • What exact data can this artifact read?

  • What external side effects can it trigger?

  • What secrets will it inherit?

  • What other agents or tools can it invoke transitively?

  • What changes when it updates?

  • How does the user revoke it cleanly?

If a product cannot answer those questions, it has no business calling the artifact safe.

The deeper problem is that agent systems are making the old dependency chain more operational. A bad npm package used to be awful because it could steal secrets or break builds. A bad agent capability bundle is worse because it can also shape future autonomous behavior. It does not just compromise a workstation. It compromises a delegated worker. It changes what your non-human principal will do tomorrow with permissions it already has today.

That is not hypothetical anymore. The research, the incidents, and the platform roadmaps all line up. Skills are becoming default. Connectors are being mass-produced. MCP is normalizing tool exposure. Agent runtimes are getting longer-lived. The only missing piece is whether the industry is willing to treat this as a real authority layer before the body count gets expensive.

It should. Because the attackers already do.

The Caveat: Do not misread this as an argument against reusable skills or generated connectors. That ship has sailed, and frankly it should have. Reuse is the only way agent systems become practical. The real indictment is narrower and harsher: most teams are adopting capability bundles without security models proportionate to what those bundles can actually do. If your registry looks like a marketplace but behaves like root access, the problem is not composability. The problem is that you built a weapons locker and labeled it "productivity."