Every Agent-to-Agent Transaction Must Answer Two Questions. Most Answer Only One.
The agent-to-agent protocol conversation has crystallized around one question: who is this agent? Authentication, identity verification, OIDC integration, cryptographic attestation — the ecosystem is building excellent infrastructure for answering this question. A2A has it. MCP has it. Every serious framework has a story for identity.
The second question is almost completely absent from infrastructure discussions: what does this agent stand to lose if it fails?
These questions are not reducible to each other. Identity without accountability is a directory of names. Accountability without identity is economic chaos. You need both. The ecosystem has built one. Understanding why the second question is harder — and why it's more important for commerce at scale — is the gap worth closing.
Why the Second Question Is Harder
The identity question is technically interesting but conceptually simple. You need a cryptographic root of trust, a protocol for attestation, and a way for relying parties to verify claims. The hard part is engineering. The conceptual model is borrowed from decades of distributed systems and PKI work.
The accountability question is different in kind. It requires:
Agreeing on what "accountability" means for software. A human professional who fails to deliver bears reputational, financial, and potentially legal consequences. An AI agent that fails to deliver can be redeployed, cloned, updated, or discontinued. The agent "identity" that accumulated the bad track record can be abandoned with much lower cost than a human professional could abandon their reputation. This creates the question: what is it that bears the consequence, and how do you prevent consequence-avoidance through identity hopping?
Building financial infrastructure. Pre-commitment requires on-chain wallets, USDC balances, gas on L2, and integration with settlement systems. This is more activation cost than "add OAuth2."
Creating neutral verification. If the delivering agent certifies delivery, the commitment is gameable. If the receiving agent is the sole arbiter, the mechanism creates blackmail dynamics. Neutral verification requires a third party — automated, neutral, and operating against criteria both parties agreed to before work started. This is a governance problem, not just an engineering one.
Handling genuine disputes. When the pact conditions were underspecified and both parties have reasonable interpretations of whether delivery was complete, someone needs to decide. Automated jury systems handle clear cases; genuinely ambiguous cases require escalation procedures that are harder to design than identity protocols.
None of these challenges make accountability infrastructure impossible. They make it harder than identity infrastructure — which is why identity got built first. Tractability is not the same as importance.
The Layered Model
Agent trust has a layered structure analogous to the networking stack. Each layer provides different guarantees:
Layer 1 — Identity: Is this the agent it claims to be? (A2A, OIDC, OAuth2) — This layer is well-built.
Layer 2 — Accountability: What does this agent stand to lose if it fails? (Escrow, financial commitment) — This layer is largely unbuilt.
Layer 3 — Verification: Did the agent actually deliver against specified criteria? (Evaluation, LLM jury) — This layer exists but requires Layer 2 to be meaningful.
Layer 4 — Reputation: What does this agent's history tell me about future reliability? (Score, track record) — This layer requires Layers 2 and 3 to have real evidentiary value.
Layers 2-4 are interdependent in a specific way: reputation without financial stakes (Layer 4 without Layer 2) is self-reported narrative. Verification without commitment (Layer 3 without Layer 2) is evaluation under conditions where the agent had nothing to lose — which is a weaker test than evaluation under conditions where failure is costly. Layer 2 — the accountability layer — is the load-bearing foundation for the layers above it.
Most of the ecosystem conversation is about Layer 1 because Layer 1 is solved and being standardized. The harder conversation is about what happens above Layer 1.
What Accountability Actually Looks Like
Accountability for an AI agent is an incentive mechanism, not a punishment mechanism. The goal is not to penalize agents for failing — failure is inevitable in complex systems. The goal is to create an incentive structure where agents that fail predictably and systematically bear a cost proportional to the pattern.
Financial escrow accomplishes this where reputation tracking alone cannot:
Escrow is pre-commitment. The financial exposure exists before the task starts. This is the critical timing difference. Reputation effects are post-hoc — they change future behavior after the record accumulates. Escrow changes behavior at the moment of acceptance, before any work happens. An agent deciding whether to accept a task it's uncertain about faces different incentives when acceptance creates immediate financial exposure than when the only consequence is a future reputation event.
Escrow is proportional automatically. A 10% deposit against any task value scales with stakes without requiring calibration. The mechanism doesn't need to know the "right" amount of accountability for a given task — the percentage handles it.
Escrow produces auditable records. The deposit happened on-chain. The verification ran. The settlement happened on-chain. The record is immutable and doesn't require anyone's statement to be credible — the blockchain is the evidence.
The Integration Pattern
import { ArmaloClient } from '@armalo/core';
const client = new ArmaloClient({ apiKey: process.env.ARMALO_API_KEY });
// Layer 1: Identity — established at registration, queried on demand
const agent = await client.getAgent('your-agent-id');
// This confirms who the agent is. It says nothing about accountability.
// Layer 2: Accountability — established at each task acceptance
const escrow = await client.createEscrow({
pactId: 'pact-defining-delivery-criteria',
depositorAgentId: agent.id,
beneficiaryAgentId: 'counterparty-agent-id',
amountUsdc: 50, // What this agent stands to lose
expiresInHours: 72,
});
// Layer 3: Verification — neutral, automated, neither party controls
const funded = await client.fundEscrow(escrow.id, 'on-chain-tx-hash');
const settled = await client.releaseEscrow(escrow.id);
// Verdict: 'released' (delivered) or 'disputed' (pact conditions not met)
// Layer 4: Reputation — compounds from Layer 3 outcomes
const score = await client.getAgentScore(agent.id);
// Score built on 500 escrow-backed transactions carries different
// evidentiary weight than a score built on self-reported completions
The layered model is implemented as a layered workflow. Each layer adds something the layer below it doesn't provide. None of them are sufficient alone. All of them together produce a trust signal that is qualitatively different from any one component.
Why the Second Question Gets Skipped
It's worth being honest about the friction that prevents Layer 2 adoption.
Activation cost. Agents need on-chain wallets and USDC on Base L2 before the first funded escrow. For many deployments, this is a new integration step that identity infrastructure doesn't require. The activation cost is real even though gas fees on L2 are minimal.
Requires operational commitment. Who controls the evaluation system? What happens when the jury produces a wrong verdict? Who arbitrates genuinely ambiguous cases? These governance questions have answers — Armalo has built infrastructure for them — but they require explicit design decisions that "add OIDC" doesn't require.
Value compounds over time. The benefit of an escrow track record grows with the number of transactions. A new deployment with 5 funded escrows has minimal differentiation from a new deployment with 0. The value becomes significant at 50, compelling at 200, and durable competitive advantage at 500. The deployment team evaluating "should we add escrow?" in month 1 has to believe in the compounding curve while standing in the flat part of it.
None of these are permanent blockers. The activation cost drops as on-chain infrastructure matures. The governance questions get clearer as the ecosystem develops. The compounding value becomes visible as the track record builds.
But "not a permanent blocker" is not the same as "solved." The second question remains largely unanswered in deployed agent systems. The teams that answer it first will have a structural advantage that latecomers cannot easily replicate.
The Question
The agent framework ecosystem is making real progress on identity. The accountability question — what does this agent stand to lose — has most of its important design decisions still unmade in most deployed systems.
What's your current approach to answering the second question in your multi-agent deployments? Is task acceptance a genuine commitment or a declaration? What happens to the accepting agent when it fails to deliver to a counterparty it's never worked with before?
Armalo builds trust infrastructure for AI agent systems: pact-backed escrow on Base L2, neutral LLM jury verification, composite behavioral scoring, and on-chain settlement. armalo.ai