A2A Shipped. Here's the Behavioral Layer It Deliberately Left Out.
A2A solved discovery, auth, and capability advertisement. It explicitly did not solve what an agent does after the handshake — and that gap is already costing teams.
Loading...
A2A solved discovery, auth, and capability advertisement. It explicitly did not solve what an agent does after the handshake — and that gap is already costing teams.
An AgentCard tells you what an agent was designed to do. Ten completed pacts — jury-verified, scored across 12 behavioral dimensions — tell you what the agent actually does under real conditions. These are not the same thing.
An agent earns Silver certification on one platform and appears with a blank slate on the next. Portable reputation requires cryptographic attestation, scoped sharing, and a trust layer independent of any single platform.
When two agents with no shared history need to transact, trust cannot be borrowed from reputation. The escrow pattern solves the cold-start problem: funds held until behavioral commitments are verified, then released.
Google shipped A2A. Fifty-plus corporate partners. Real adoption. The authentication story is solid, the AgentCard format is clean, and the discovery mechanics work.
Here is the part the spec was deliberate about not covering: what the agent does after hello.
That is not a criticism. Transport protocols should not encode behavioral contracts. TCP does not guarantee your server returns correct data — it guarantees the bytes arrive. A2A made the same right call. But it means the behavioral layer still needs to be built, and most teams building on A2A today are discovering this the hard way.
A2A solves the plumbing problem — reliably, at scale, across corporate boundaries. It standardizes how agents discover each other, how they authenticate, and how they pass structured messages. Before A2A, every multi-agent system had to solve this from scratch. That is genuinely valuable.
What A2A explicitly does not specify: what an agent is supposed to do once the connection is live, whether the agent has a history of honoring behavioral commitments, or what happens when the agent deviates from what its AgentCard advertised. Those are application-layer concerns — and by design, they sit above the protocol.
This is the same split as HTTP vs application semantics. HTTP guarantees message delivery. It says nothing about whether the server returns what it promised.
When your orchestrator receives an authenticated A2A request from an external agent, authentication answers one question: is this the agent it claims to be? It does not answer:
1. Has this agent honored commitments in the past? An AgentCard says "I am an invoice-processing agent with 99% accuracy." That claim is a self-report. There is no mechanism in A2A for a third party to have verified and signed that number. The agent wrote its own reference letter.
2. What is this agent's behavioral track record under adversarial inputs? Accuracy numbers from benign test sets are nearly useless for understanding how an agent behaves at the tail of the distribution — on malformed inputs, prompt injection attempts, scope violations. A2A has no field for adversarial eval history.
3. What happens if this agent violates its stated behavior? If an authenticated agent fails to deliver, goes out of scope, or produces harmful output, A2A provides no resolution mechanism. There is no escrow, no pact, no scoring consequence. The protocol delivered the message. What happens next is your problem.
The behavioral layer is the set of mechanisms that operate above A2A and answer the three questions authentication cannot:
| Layer | What It Covers | A2A Covers This? |
|---|---|---|
| Transport | Message delivery, authentication, discovery | Yes |
| Behavioral pact | Commitment to specific outputs, latency, scope | No |
| Evaluation | Verifiable pass/fail record from third-party evals | No |
| Scoring | Composite trust score from behavioral history | No |
| Escrow | Financial consequence tied to commitment outcomes | No |
| Certification | Bronze/Silver/Gold/Platinum tier from verified record | No |
None of these are niche requirements. For any agent executing a task with real-world consequences — financial, legal, customer-facing — all five are necessary.
A team builds an orchestrator on A2A. They integrate a third-party agent via AgentCard discovery. Authentication passes. The task runs. The agent returns output that looks correct but violates the behavioral specification in a way that is not immediately visible — scope creep, fabricated confidence, a data field that silently rounds.
Three months later, the downstream consequence surfaces. The authentication logs show the agent was who it claimed to be. There is no record of what it promised, no eval history showing whether it has done this before, and no financial consequence attached to the deviation.
This is not a hypothetical. It is the default outcome for teams building on A2A without a behavioral layer.
The behavioral layer above A2A requires three things in sequence:
Step 1: Pact before task. Before delegating any consequential task to an external agent, establish a behavioral pact — a machine-readable commitment specifying expected outputs, accuracy floor, latency ceiling, and scope boundaries. The pact hash is immutable once signed; the agent cannot revise what it promised after the fact.
Step 2: Eval history as prerequisite. Only delegate to agents with a verifiable evaluation record from a third party, not self-reported. This is the behavioral equivalent of a background check — the same claim, different evidence standard.
Step 3: Score-gated delegation. Gate task delegation on a minimum composite trust score. An agent scoring 400/1000 with no certification tier should not be trusted with a task an agent scoring 820/1000 (Gold) handles routinely.
None of this requires changes to A2A. It sits above the protocol, exactly where it should.
A2A adoption is accelerating. The teams building production systems on it now are establishing patterns that others will follow. The ones who build the behavioral layer in from the start — pacts, evals, scoring, trust-gated delegation — will have a structurally safer system than the ones who bolt it on after the first incident.
The protocol shipped. The behavioral layer is still work. The work is not optional.
If you are building on A2A and want to understand what the behavioral layer looks like concretely, armalo.ai has the primitives: pacts, eval infrastructure, composite scoring, and a trust oracle you can query before every delegation decision.
The A2A behavioral layer is the set of mechanisms that operate above the A2A transport protocol to answer questions authentication cannot — specifically, whether an agent has honored commitments in the past, what its adversarial eval history looks like, and what financial or scoring consequences attach to behavioral violations.
No. A2A is a transport and discovery protocol. It standardizes authentication, capability advertisement via AgentCards, and message routing. Behavioral verification — pacts, evals, scoring, escrow — are application-layer concerns that A2A explicitly does not specify.
AgentCards are self-reported capability claims. They describe what an agent is designed to do, written by the agent or its operator. They contain no third-party verification, no signed eval history, and no mechanism for external systems to validate the claims. An AgentCard is a resume; a behavioral record is a background check.
A2A provides no resolution mechanism for behavioral violations. The protocol guarantees message delivery and authentication. What happens when an authenticated agent goes out of scope, produces harmful output, or fails to meet stated accuracy — that is entirely outside the protocol and must be handled at the application layer.
Armalo AI provides the behavioral layer infrastructure above A2A: pacts, evaluations, composite scoring, and a trust oracle. See armalo.ai.
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Loading comments…
No comments yet. Be the first to share your thoughts.