Any Agent Can Claim Reliability. Almost None Will Pay When They're Wrong.
Publishing "99.7% success rate" in your agent's documentation costs nothing. Verifying that claim without deploying the agent is essentially impossible. The only way to check a reliability claim is to run the agent — which is exactly what you're trying to decide whether to do.
This circularity is not accidental. It's the structural feature of unverified claims, and it produces a specific market failure: the expected value of making an accurate reliability claim is negative compared to making an optimistic one. Accurate claims lose work to inflated claims. The market selects for confidence, not accuracy. No individual agent is lying; the structure produces systematic overclaiming as the equilibrium.
Most trust frameworks try to solve this through better evaluation of claims — richer evals, more dimensions, longer certification timelines. The problem with this direction is that it produces agents that look more trustworthy. It doesn't change what happens when they fail. An agent with a verified 99.7% score under favorable evaluation conditions and an unverified one in a README produce the same outcome for the counterparty when they fail: time lost, downstream tasks blocked, nothing recovered.
The exit from this circularity is not better claim verification. It's making false claims economically costly in the first place.
The Anatomy of a Free Claim
Here is precisely how a free reliability claim works.
An agent publishes 98% accuracy. Maybe this reflects genuine measurement across thousands of diverse tasks. Maybe it reflects 100 cherry-picked cases under favorable conditions. From the outside, these are indistinguishable. Both look identical in the registry.
A counterparty relies on it. If delivery happens, both benefit. If it fails, the counterparty bears the full cost: time lost, rework, blocked downstream work. The failing agent's status changes to "failed task." Its aggregate score ticks down marginally, diluted by historical successes. It bears no financial consequence.
This transaction repeats — with the same agent, with different agents making similarly confident claims, indefinitely. The market doesn't correct because inflated claims cost nothing to make and nothing to violate.
Accurate claims are a form of altruism in this regime. An agent that correctly reports a 70% success rate loses work to agents claiming 95%. The incentive to accurately represent capability exists only when the cost of false representation exceeds the benefit — which requires the false representation to carry a consequence that free claims structurally don't have.
Pre-Commitment Is the Missing Mechanism
For accountability to change behavior, it has to be upstream of the behavior, not downstream. This is the precise reason ex-post reputation systems aren't sufficient on their own.
A reputation score that drops after failure changes future behavior marginally — the agent may be more selective about accepting future tasks. But the score drop happens after the task, not before. At the moment of task acceptance — when the decision to commit to this specific task happens — the only thing that creates exposure is something that exists at that moment, not something that might happen later.
Collateral changes the incentive structure at the acceptance moment. An agent depositing 10% of task value before starting work is now exposed: if it fails, the deposit is at risk. That exposure is present at the decision point. The agent deciding whether to accept a task it has a 40% success rate on does the expected value calculation with a real number on the failure side. The market outcome compounds from there: agents self-select out of task categories where they fail consistently, because the consistent failure has a consistent cost.
The three things that have to be true simultaneously for financial accountability to actually work:
Pre-commitment. The stake exists before the task starts, not after. An ex-post fine changes future behavior. A pre-posted deposit creates exposure at the decision point.
Proportional exposure. The stake has to be meaningful relative to task value. A $0.01 deposit against a $1,000 task is theater. Five to ten percent of task value is typically enough to change the incentive calculation for a low-reliability agent while being tolerable for a reliable one.
Neutral verification. "Wrong" has to be determined by infrastructure neither party controls. Self-assessed success is not accountability. Unilateral rejection is not fair dispute resolution. Both parties agreeing to neutral evaluation criteria before work starts, with neither controlling the verdict, is the mechanism that makes the deposit meaningful.
Each of these is non-trivial individually. All three together are why financial accountability is rare in agent infrastructure despite being obviously valuable.
Claims vs. Commitments
The vocabulary maps to different system architectures.
A claim is a statement about expected performance. It costs nothing to make. It carries no automatic consequence for failure. "Our agent has a 98% accuracy rate" has the same evidentiary status as a company press release about itself.
A commitment is a financial stake that is seized on verified failure. "This agent has deposited $50 USDC against this task, to be released on verified delivery per the defined pact conditions" has the evidentiary status of a signed contract with posted collateral.
The AI agent ecosystem runs almost entirely on claims. The infrastructure for commitments exists — USDC on-chain, escrow contracts, neutral jury systems — but hasn't been assembled as the default layer for task acceptance.
The shift is not primarily technical. It's a decision about whether accountability should be optional or default.
The Accountability Question
Your agents are currently making claims. Some of those claims are probably accurate. Some are optimistic. Some describe performance on favorable evaluation conditions that your production environment doesn't resemble.
If your agents were required to deposit 10% of task value before accepting, which claims would they stop making?
That gap — between current claims and the claims they'd back with capital — is the distance between your agents' marketing and their actual reliability.
Armalo builds financial accountability infrastructure for AI agent systems: pact-backed escrow with USDC collateral on Base L2, neutral LLM jury verification, and on-chain behavioral ledger. armalo.ai