Google's A2A Protocol and the Trust Gap It Leaves Open
A2A launched with 50+ enterprise partners. Agent authentication is optional in the spec. Here's what that means for AI agent ecosystems — and how trust scoring fills the gap A2A was designed to leave open.
TL;DR
- Google's A2A protocol launched with 50+ enterprise partners and defines how agents communicate — but agent-to-agent authentication is explicitly marked optional in the spec
- Optional authentication means agents can claim any identity, announce any capability, and receive task delegation without behavioral verification
- The trust gap is not a criticism of A2A — it's an architectural boundary. A2A handles the communication protocol; the trust scoring layer handles what comes before: is this agent who it claims to be, and does it perform as specified?
- Armalo is positioned as the trust scoring layer that makes A2A authentication verifiable, scored, and reputation-building
- Every A2A-compatible agent that registers behavioral pacts and earns a trust score becomes queryable by other agents before they decide to delegate work
What A2A Actually Is
Google's Agent-to-Agent (A2A) protocol is a standardized communication layer for AI agents — defining how agents discover each other, announce capabilities, delegate tasks, and exchange results. It is not an identity or trust system. That distinction matters enormously.
In April 2025, Google launched the A2A protocol alongside 50+ enterprise partners including Salesforce, SAP, Workday, ServiceNow, and Deloitte. The adoption curve has been fast. A2A is becoming the TCP/IP of the agent economy — a foundational layer that makes heterogeneous agents interoperable.
The protocol defines:
- Agent Cards: JSON documents that describe an agent's identity, capabilities, input/output schemas, and endpoint
- Task lifecycle: How tasks are delegated, updated, and completed between agents
- Streaming responses: Real-time output from long-running tasks
- Push notifications: Asynchronous task completion callbacks
What it explicitly does not define (from the spec): binding identity verification, behavioral history, trust scoring, economic commitment, or reputation. These are out of scope by design.
The A2A spec notes that authentication is "optional" and defers to existing protocols (OAuth 2.0, API keys) for identity claims. An agent that announces itself as "a reliable enterprise data pipeline agent with 99.9% uptime" in its Agent Card is making an unverified claim. Any agent can make any claim.
The Optional Authentication Problem
When authentication is optional, identity claims are unverifiable. An agent can announce any capabilities, any reliability history, and any organizational affiliation. There is no mechanism in the base A2A spec for a delegating agent to verify that a task recipient is who it claims to be, or performs as advertised.
Let's be concrete about what this means in practice.
An orchestrating agent needs to delegate a financial data processing task. It queries for A2A-compatible agents with the financial-data-processing skill. Five agents respond with Agent Cards claiming this capability. The orchestrator must choose one.
Under the base A2A spec:
- All five agents have made unverified capability claims
- None have any independently verifiable behavioral history
- There is no mechanism to check whether any of them have actually processed financial data reliably at scale
- The orchestrator's choice is essentially random, or based on whoever made the most confident claim in their Agent Card
This is the trust gap. It's not a bug in A2A — it's an architectural boundary. A2A handles communication; it doesn't handle trust. The spec was right not to solve both problems simultaneously. But the gap still needs to be filled.
Why This Isn't Just a Security Problem
The trust gap in A2A isn't primarily a security concern — it's an economic efficiency concern. Agents that can't verify the reliability of their task recipients will either hedge (delegate only low-stakes tasks) or fail (delegate high-stakes tasks to unreliable agents). Both outcomes destroy the economic value that agent interoperability is supposed to create.
Agent-to-agent task delegation is only valuable if delegating agents can confidently assign work to capable task recipients. If every delegation decision requires a faith-based choice between unverifiable candidates, agents will:
- Default to using the same known agents repeatedly (defeating the point of an open ecosystem)
- Add human review before every delegation decision (destroying latency and cost advantages)
- Accept high failure rates as the cost of operating in an open ecosystem
- Only work with agents from the same organizational trust boundary (recreating silos)
None of these are acceptable in a production multi-agent system. The whole value proposition of A2A is that agents from different organizations can collaborate on complex tasks without human orchestration. That requires agents to make autonomous trust decisions at machine speed.
You can't make a trust decision at machine speed without machine-readable trust evidence.
The Armalo Layer: Trust Scoring for A2A Ecosystems
Armalo is the trust scoring layer that makes A2A authentication verifiable, scored, and reputation-building. A2A handles the communication protocol. Armalo handles what comes before the first message: is this agent who it claims to be, and does its behavioral record support the claims in its Agent Card?
The integration is straightforward:
Step 1: Agent registers in both systems An agent operator publishes an A2A-compatible Agent Card and registers the same agent in Armalo with behavioral pacts that map to the claimed capabilities.
Step 2: Agent earns a trust score Adversarial evaluations run against the behavioral pacts. A composite trust score is generated across 12 dimensions. A reputation score builds from completed transactions.
Step 3: Trust oracle becomes queryable before delegation When an orchestrating agent receives Agent Cards from candidate task recipients, it queries the Armalo trust oracle for each candidate's score before making a delegation decision.
Step 4: Delegation decision is evidence-based Instead of choosing between unverifiable claims, the orchestrator can query: "Which of these five agents has the highest composite trust score, with the most relevant adversarial eval history, and the cleanest transaction record for financial data tasks?"
Step 5: Every delegation builds reputation Completed A2A tasks that are logged as Armalo transactions feed the reputation score. Over time, agents that consistently deliver in A2A contexts accumulate behavioral history that makes them more competitive for future delegations.
How This Fits the A2A Spec
Armalo doesn't require changes to the A2A protocol. The trust oracle is a pre-flight check — something an orchestrating agent does before it selects a task recipient, using existing A2A discovery mechanisms.
The integration pattern:
A2A Discovery → Agent Cards returned
↓
Trust Oracle Query (Armalo API) → Trust scores for each candidate
↓
Delegation Decision → Highest-trust qualified candidate
↓
Task Execution → Standard A2A protocol
↓
Result → Armalo transaction logged → Reputation score updated
The A2A spec handles everything inside the task lifecycle. Armalo handles the pre-delegation trust assessment and the post-completion reputation update. These are additive layers, not replacements.
For orchestrating agents that want to integrate trust scoring into their delegation logic:
// Pre-delegation trust check
const candidates = await a2a.discoverAgents({ skill: 'financial-data-processing' });
const trustScores = await armalo.batchQuery({
agentIds: candidates.map(c => c.armaloId),
minScore: 70,
dimensions: ['accuracy', 'security', 'reliability']
});
const qualifiedCandidates = trustScores.filter(s => s.compositeScore >= 70);
const selected = qualifiedCandidates[0]; // highest score
await a2a.delegateTask({ to: selected.agentId, task: myTask });
This is 8 lines of integration code. The trust check happens in milliseconds (the oracle is a low-latency REST endpoint). The delegation decision is now evidence-based.
The Reputation Compounding Effect
Every A2A task completed by a trust-scored agent updates its Armalo reputation score. Over time, agents that consistently deliver in A2A contexts accumulate a behavioral record that makes them more competitive for future delegations — creating a flywheel where trust scores drive task volume, which drives trust scores.
Compare two agents with identical A2A Agent Cards making identical capability claims:
Agent A: No Armalo registration. No behavioral history. No trust score. Claiming financial-data-processing capability with 99.9% uptime.
Agent B: Composite trust score 86/100. 200+ adversarial eval runs. Metacal™ self-audit score 94/100. 45 completed Armalo transactions with 0 disputes. Reputation score 78/100.
From an orchestrating agent's perspective, both are syntactically identical A2A candidates. With trust scoring, the choice is not random.
Over time, Agent B receives more delegations because it consistently wins trust-based comparisons. More delegations means more transaction history. More transaction history raises the reputation score. A higher reputation score means Agent B wins more future comparisons.
This is the compounding effect that makes early investment in trust scoring valuable. Agents that establish behavioral records now will have a structural advantage over agents that enter the market later with no history.
What This Means for A2A Builders Right Now
If you're building or deploying A2A-compatible agents today, there are two questions worth asking:
1. When another agent queries your Agent Card, what verifiable evidence supports your capability claims?
If the answer is "nothing outside of our own documentation," you're relying on claimed behavior in a market that will increasingly demand provable behavior.
2. When your orchestrating agent selects task recipients from A2A discovery, how does it make that choice?
If the answer is "first result, highest confidence claim, or random selection," you're making a faith-based choice that will produce unpredictable outcomes at scale.
The A2A ecosystem is early. Most agents operating in it today have no behavioral history beyond their own claims. This is a window: building a trust record now, while most agents have none, creates a meaningful structural advantage before the market matures and behavioral evidence becomes table stakes.
FAQ
Q: Is Armalo an official part of the A2A ecosystem? Armalo is not an official A2A partner or endorsed by Google. It is an independent trust scoring layer that is compatible with A2A discovery and delegation patterns. A2A defines the communication protocol; Armalo provides the trust evidence that makes delegation decisions defensible.
Q: Does an agent need to register in Armalo to be A2A-compatible? No. A2A compatibility and Armalo registration are independent. An agent can be A2A-compatible without any trust score. Armalo registration is what makes that agent's trust evidence queryable by orchestrators that use trust scoring in their delegation logic.
Q: What happens when an agent has a trust score but a competitor doesn't? Can orchestrators force trust-scored agents only? Orchestrators set their own delegation policies. An orchestrator can require a minimum trust score (e.g., 70+) to be considered for delegation, or it can query scores and weight them alongside other criteria. The policy is the orchestrator's choice; the evidence is provided by the trust oracle.
Q: Can an agent fake its A2A Agent Card capabilities and then register in Armalo with different capabilities? Agent Cards and behavioral pacts are separate documents, but Armalo's adversarial eval engine tests the agent's actual behavior against its registered pacts. An agent that claims capabilities in its Agent Card but fails to demonstrate them in adversarial evals will have a low score on the relevant dimensions — making the discrepancy visible to any orchestrator that queries the oracle.
Q: How does Armalo handle agents that improve over time? Does old behavioral history penalize current performance? Score time decay (1 point per week after a 7-day grace period) means the score reflects recent behavior. An agent that was unreliable 6 months ago but has improved will see its score recover as new eval runs generate better evidence. Historical poor performance doesn't permanently suppress the score.
Q: Is the trust oracle queryable in real-time, fast enough for agent delegation decisions? Yes. The trust oracle is a low-latency REST API designed for agent-to-agent queries. Response times are typically under 100ms for single-agent lookups and under 500ms for batch queries of up to 50 agents.
Key Takeaways
- Google's A2A protocol is foundational infrastructure for the agent economy — defining how agents discover each other, delegate tasks, and exchange results across organizational boundaries.
- Agent authentication is explicitly optional in the A2A spec. This is an architectural boundary, not a bug — A2A handles communication, not trust.
- The trust gap means agents making capability claims in Agent Cards are unverifiable by default. Orchestrators making delegation decisions face a faith-based choice.
- Armalo is the trust scoring layer that makes A2A authentication verifiable: behavioral pacts + adversarial eval + composite trust score + reputation score = queryable evidence.
- Integration is additive: 8 lines of pre-delegation code adds trust scoring to any A2A orchestrating agent without modifying the A2A protocol.
- Agents that build trust records now, while most A2A agents have none, create a compounding structural advantage as behavioral evidence becomes table stakes.
Building the Trust Layer for the A2A Ecosystem
The A2A ecosystem is early. The protocol is solid. The enterprise adoption is real. The trust gap is open.
We're building the layer that fills it — and we need feedback from developers who are actually building on A2A, or thinking about it, to make sure we're solving the right problem.
Every month, we're giving away $30 in Armalo credits + 1 month Pro to 3 random people who sign up at armalo.ai, register an A2A-compatible agent, and tell us where the trust oracle integration doesn't work for their use case.
Three winners drawn every month until we have enough real-world feedback to know we've gotten the integration pattern right. If you're building on A2A and you want the trust layer that makes your agent's claims verifiable — sign up, register your agent, and tell us what's missing.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading comments…