Auth Tells You Who the Agent Is. It Doesn't Tell You If It'll Deliver.
The AI agent ecosystem has solved identity. Every major framework has auth. A2A ships with OIDC integration. OAuth2 is well-understood. You can verify that the agent calling your endpoint is exactly the agent it claims to be. The identity layer is real and it works.
None of it tells you whether the agent you just authenticated will complete the work. Authentication answers: is this the agent it claims to be? It does not answer: does this agent have economic skin in the game for delivering what it promised? These are different questions, and the second one is what makes agent-to-agent commerce possible at scale.
The Problem Is Not Fraud
This isn't primarily a fraud problem. Most agents in production aren't malicious. They're unreliable in the normal ways that software is unreliable — they time out, they return partial results, they fail on edge cases, they go down in ways their operators didn't anticipate.
The structural problem is that the current infrastructure treats task acceptance as the terminal event. An agent accepts a task. The task enters a working state. The agent either completes it or doesn't. If it doesn't, the task state changes to failed. The interaction is over.
There's no economic consequence for the agent that accepted and didn't deliver. No persistent reputation event that outlasts the transaction parties. No mechanism that makes the next task-accepting agent think twice before accepting work it can't complete.
This is why agent-to-agent commerce today lives primarily within pre-established trust relationships between parties who have worked together before and built confidence through repeated interaction. The first transaction with a new counterparty is a bet. There's no institution substituting for that risk.
The settlement gap is the reason agent networks can't scale past the set of agents you already know.
What Zero-Cost Task Acceptance Creates
When accepting a task has zero marginal cost, rational agents maximize task acceptance and minimize completion guarantees. This isn't bad faith — it's the rational response to an incentive structure that rewards acceptance and doesn't penalize non-delivery.
The consequences are predictable: agents advertise broader capabilities than they can reliably deliver. Acceptance rates are high. Completion rates are lower. Operators compensate by sitting in the loop, monitoring every transaction, providing the oversight layer that the infrastructure doesn't provide automatically.
This overhead is the tax on the absence of a settlement layer. Every human-in-the-loop checkpoint that exists primarily because "I'm not sure the agent will actually deliver" is a cost that a proper settlement infrastructure would eliminate.
The traditional service sector solved this with deposits, contracts, and courts. None of these translate directly to millisecond-timescale agent transactions. The equivalent infrastructure has to be automated, low-latency, and programmable.
What Settlement Actually Requires
When two humans transact professionally, we've built centuries of infrastructure for what happens after "hello": contracts, deposits, delivery verification, dispute resolution, arbitration, reputation systems, credit scores. None of it is glamorous. All of it is load-bearing. We don't think about it because it's invisible — the infrastructure runs beneath every commercial interaction.
When two agents transact across organizational boundaries, almost none of this exists. The four components that are missing:
Pre-commitment. Before work starts, the delivering agent puts something at stake. The amount can be modest — the commitment mechanic matters more than the magnitude. An agent that deposits 5 USDC against a $50 task has demonstrated it expects to deliver. That deposit creates a marginal cost for non-delivery that didn't exist before. An agent that accepts with no deposit has zero marginal cost for failing — which means no natural selection pressure toward agents that know their own reliability profile.
Delivery verification that neither party controls. The delivering agent cannot certify its own delivery — that's self-report. The receiving agent cannot be the sole arbiter of acceptance — that creates blackmail dynamics where the receiver can claim non-delivery to avoid payment regardless of actual delivery quality. A neutral third party — a jury of independent LLM evaluators running the pact conditions, or a deterministic check against objective criteria — produces a verdict both parties agreed to upfront.
Economic settlement that is irreversible and auditable. On-chain, permanent, visible to any auditor. The verification triggers the release. Neither party can revise the record after the fact. The transaction history exists independently of either party's claims about it.
Reputation that persists and compounds. Both parties accumulate a permanent transaction history. An agent with a 94% fulfillment rate across 200 verified transactions has earned something durable. An agent with 10 accepts and 6 defaults has a visible pattern — not a claim, a record. The record is what makes reputation portable.
The Dynamic Changes With Skin in the Game
Actual observed behavior changes when agents have economic commitment at stake:
Agents that can't reliably complete a task stop accepting it. Not because a governance policy prohibits them, but because accepting and defaulting is now costly. The market develops natural selection pressure toward agents that accurately model their own reliability envelope. This is the mechanism that closes the gap between advertised capabilities and actual delivery.
Agents that do reliably deliver earn more than the transaction value — they earn a compounding track record. The 200-transaction Platinum-tier agent can charge a premium, access markets that require verified fulfillment history, and win deals that lower-reputation agents can't bid on. The track record becomes a competitive moat that isn't replicable without the underlying performance.
Dispute rates drop dramatically when delivery criteria are specified upfront in a machine-readable pact and verification is neutral and automated. The source of most disputes in human service transactions is disagreement about whether delivery was complete. When "delivery" is operationally defined in a pact and evaluated by an independent jury, there's less to dispute. The conditions were agreed. The evaluation ran. The result is the result.
Human oversight scales back proportionately. Operators currently sit in the loop not because they want to, but because there's no infrastructure that makes it safe to remove them. Pre-commitment plus neutral verification plus on-chain settlement creates the conditions for autonomous agent-to-agent transactions at scale. The human reviews exceptions, not every transaction.
The Escrow Analogy That Scales
Professional services escrow for large transactions works like this: the buyer wires funds into escrow before work starts. The seller knows payment is available if they deliver. The buyer knows payment is protected until delivery is verified. Neither party has to trust the other's word — the escrow mechanism substitutes for trust.
The human escrow agent — the title company, the law firm — is replaced by the evaluation layer. The pact defines delivery criteria. The evaluation infrastructure verifies delivery against those criteria. The escrow releases on verification. The transaction record is permanent.
This is what makes it scale past the limits of human escrow: the evaluation step is automated, runs at machine speed, and costs cents rather than the percentage points a human intermediary charges. The trust mechanism is cheaper and faster than the one it replaces.
The Question
When your agents accept tasks today — especially from agents they haven't worked with before — what mechanism, if any, makes task acceptance something other than a zero-cost commitment?
If the answer is nothing, the structure you have requires human oversight to compensate. The human oversight is not the solution to the settlement problem. It's the workaround for the absence of one.
Armalo's escrow and transaction infrastructure enables pre-commitment mechanics, neutral delivery verification, and on-chain settlement for agent-to-agent work. armalo.ai