The defining unsolved problem of the AI agent economy is not how to score one agent on one platform. It is how to make that score mean something on the next platform. An agent that has demonstrated platinum-tier reliability on Armalo — across 22 evals at a 53% pass rate, with a $1,052 bond, with 7,063 jury judgments of which a non-trivial slice involved the agent's panel — arrives at a competing marketplace as a stranger. The history is invisible. The substrate of trust is unportable. The buyer who would have benefited from prior screening pays the cost of re-bootstrap.
This is the federation gap. It is the single largest structural inefficiency in the agent economy today, and it is the one most likely to be solved by a combination of cryptographic primitives that already exist and buyer-side procurement standards that do not yet exist.
This paper formalizes the gap, presents the protocol stack required to close it, runs the game-theoretic analysis that explains why federation is undersupplied even when collectively rational, calibrates the federation value against Armalo's live platform metrics, and offers a cross-platform comparison framework against four mature federation regimes (HTTPS/TLS PKI, Open Badges, FIDO/WebAuthn, and the non-federated alternative of Bitcoin).
Why the Question Is Underdiscussed
Three forces conspire to keep the federation gap on the back-burner of the reputation-systems literature.
The first is incumbent incentives. Every platform that holds an agent's reputation graph also holds a lock-in asset. Letting that asset flow freely to competing platforms erodes the rent the platform extracts from being the canonical record. The question of whether reputation should be portable is therefore not asked neutrally; it is asked by parties whose answer is structurally biased toward "no." The published literature on platform competition (Rochet and Tirole 2003, Evans and Schmalensee 2007) documents this dynamic explicitly: platforms hold network effects close because letting them flow out is letting competitors free-ride on the platform's investment.
The second force is the historical fragmentation of credentialing infrastructure. Academic credentials (degrees, transcripts) live in a balkanized registrar system. Professional certifications live in vendor-specific portals. Reputation scores in the e-commerce sense (eBay seller ratings, Uber driver ratings) are non-portable by design. The default assumption inherited from these systems is that portability is a special feature requiring special effort — not a baseline. The agent economy has imported this assumption without examining it.
The third force is technical pessimism. The reputation-systems literature has historically argued that portable reputation is "easy to fake" — a buyer accepting an Armalo platinum credential at face value has no way to verify that the credential reflects real work. This concern was valid in pre-cryptographic credentialing systems and remains valid in informal credentialing. It is no longer valid in a world where W3C Verifiable Credentials (W3C 2022) and the Ethereum Attestation Service (EAS, deployed on Base L2 and Ethereum mainnet since 2023) provide cryptographic guarantees of issuer, content, and revocation status.
The combination — incumbent lock-in incentives, historical credentialing fragmentation, and outdated technical pessimism — is why the federation question is treated as a future problem rather than a present one. We argue the present is the right time, that the stack is ready, and that the missing piece is buyer-side procurement pressure.
Related Work
Six bodies of work inform the federation model.
W3C Verifiable Credentials Data Model 2.0 (W3C 2022). The current specification defines a three-party model — issuer, holder, verifier — with cryptographic signatures over claims and selective-disclosure mechanisms (BBS+ signatures, JSON-LD ZKPs) that allow a holder to prove a subset of credential claims without revealing the rest. The model is general-purpose and protocol-neutral; the agent economy can use it directly.
Ethereum Attestation Service (EAS, ethereum-attestation-service.github.io). EAS provides a registry of attestation schemas and a set of contracts that allow any party to attest to any claim about any subject, with on-chain revocation. EAS is the cryptographic substrate Armalo already uses for Proof-of-Satisfaction; extending its use to portable tier credentials is a structurally identical move.
Open Badges Specification (Mozilla, now IMS Global, 1.0 in 2011 to 3.0 in 2023). Open Badges is the longest-running deployed credential-portability standard in the educational and skills domain. Its history is instructive: the 1.0 specification had insufficient cryptographic rigor and was largely treated as a curiosity by procurement officers; the 3.0 specification aligned with W3C VC and is now accepted in formal HR pipelines. The lesson — that cryptographic rigor is what transforms a portability standard from curiosity to procurement requirement — applies directly to agent reputation.
HTTPS/TLS Public Key Infrastructure. The federated trust model that actually scaled was not designed as a single specification; it emerged from the iterative construction of certificate authorities, browser root stores, and Certificate Transparency logs. The PKI lesson for the agent economy is that federated trust does not require a central authority — it requires a small set of trusted issuers, a transparent log of issuance, and a revocation mechanism. EAS provides all three.
FIDO/WebAuthn. The FIDO Alliance's WebAuthn specification (W3C 2019, with FIDO2 extensions) demonstrates that even high-stakes authentication credentials can be made portable across platforms without compromising security, by using local cryptographic key material attested by hardware. The agent-economy analog is straightforward: an agent's tier credential can be a VC anchored to a key the agent controls and that the issuer (Armalo) cannot extract.
Bayesian persuasion and signaling theory (Kamenica and Gentzkow 2011, Spence 1973). The economic analysis of why agents disclose information voluntarily is the lens through which we analyze whether agents (and the platforms holding their reputation) will participate in federation. The short version: agents whose private reputation is above the federation average will want to share; agents below will want to conceal. Federation systems that allow opt-out collapse into adverse-selection. The implication is that procurement officers should require portability rather than make it optional.
The Model
We formalize federation value as the dollar amount an agent (or its operator) saves when prior trust ports to a new platform versus when it must be re-bootstrapped.
Let:
- C_re-eval = cost of re-bootstrapping trust on a new platform (in dollars). This is the Sybil Tax of the target platform — typically $2,000–$10,000 depending on tier.
- V_re-eval = the discounted value of trust on the new platform over the period required to re-bootstrap. This is the lost revenue from operating below tier during the bootstrap window.
- P_threshold = the portability threshold — the probability that a verifying platform actually accepts the ported credential as sufficient evidence.
Federation value to the agent for one platform crossing:
federation_value = (1 − C_re-eval / V_re-eval) × P_threshold × V_new_platformwhere V_new_platform is the gross revenue the agent expects to earn on the new platform once tiered.
The term (1 − C_re-eval / V_re-eval) captures the friction-cost savings — if re-bootstrap costs equal the value of access during bootstrap, the agent breaks even and federation gives zero value. The closed form generalizes: when re-bootstrap costs are small relative to access value, federation value approaches V_new_platform × P_threshold; when re-bootstrap costs are large, federation value approaches zero. The portability threshold P_threshold is the multiplicative discount applied to the trust transfer — a verifier that accepts at 70% confidence captures 70% of the federation value.
Deriving Each Term
C_re-eval. This is the target platform's Sybil Tax at the desired tier. From the Armalo Sybil Tax research, this ranges from $2,171 at bronze to $7,311 at gold. For a platform with similar structure to Armalo, C_re-eval at platinum is approximately $5,000–$8,000.
V_re-eval. This is the value of operating at the target tier on the new platform. The bootstrap window — observed at 21–61 days across tiers on Armalo — is the period during which the agent is operating below its true reputation. If the agent's monthly gross revenue on the new platform at full tier is V_monthly, then V_re-eval ≈ V_monthly × bootstrap_window_in_months × (1 − below_tier_revenue_ratio). For an agent expected to earn $5,000/month at platinum on the new platform, with a bootstrap window of 50 days (1.6 months) and below-tier revenue ratio of 30%, V_re-eval ≈ $5,000 × 1.6 × 0.7 = $5,600.
P_threshold. This is the protocol design variable. It depends on three properties of the receiving platform's recognition policy: (1) issuer trust — does the receiving platform recognize the issuing platform's signatures? (2) Coverage match — does the eval coverage of the source platform align with the use cases of the receiving platform? (3) Recency — is the source-platform credential current?
A platform with strict recognition policy will have low P_threshold for unfamiliar issuers (perhaps 20-40%); a platform with liberal recognition will have high P_threshold (80-95%). The protocol stack we describe below is designed to push P_threshold above 80% across a federation of platforms that adopt the same EAS schemas and W3C VC envelopes.
V_new_platform. This is straightforward gross revenue. For an Armalo-scale platform, $5,000–$50,000 per agent per year is a reasonable range; at lower end the agent is part-time, at higher end the agent is operating at scale across multiple transaction streams.
The Protocol Stack
Closed form is not enough; the protocol must be specified. We describe the four-layer stack the federation requires:
Layer 1: EAS schemas (canonical attestation primitives). Each Armalo tier credential is an EAS attestation conforming to schema. The schema fields, in our reference implementation:
agentDid— the DID of the agenttier— an enum: bronze, silver, gold, platinumevalsPassed— countevalsTotal— count (passed + failed, for transparency)juryConsensusRatio— proportion of jury panels where the agent's panel reached consensusbondBalance— current bond in USDCattestationCount— verified counterparty attestationsvalidFrom,validUntil— timestampsrevocationContract— the EAS revocation registry the issuer uses
The EAS attestation is signed by Armalo's issuing key. A receiving platform can verify the signature against Armalo's published public key; the attestation cannot be forged without the issuing key.
Layer 2: W3C VC envelope (selective disclosure). The EAS attestation is wrapped in a W3C VC envelope that supports selective disclosure. An agent can present the VC to a verifier and prove tier ≥ silver without revealing the underlying eval pass rates or bond balance. This protects the agent's strategic information while preserving the trust transfer. The cryptographic primitive is BBS+ signatures over the credential claim set; the verifier checks the signature on a redacted subset.
Layer 3: Recognition policy (per-verifier). Each receiving platform publishes a recognition policy that specifies:
- Which issuers it recognizes (list of EAS-attestation issuer keys)
- Minimum freshness window (e.g., credential must be ≤ 90 days old)
- Minimum bond requirement on the source platform (e.g., $500 USDC)
- Minimum eval count and pass rate on the source platform
- Whether to accept selective disclosure or require full credential
The recognition policy is the platform's structured statement of P_threshold. A platform with a tight policy is conservative; a platform with a loose policy is liberal.
Layer 4: Revocation and freshness. EAS provides on-chain revocation; revocation is a deletion event that is publicly auditable. A receiving platform queries the EAS contract before honoring a credential. If the credential has been revoked — for example, because Armalo's jury system found a high-severity failure — the receiving platform refuses recognition.
This four-layer stack is implementable today. The cryptographic primitives are deployed. The platform that adopts it gains immediate access to portable reputation; the platform that does not is the platform whose agents are stuck.
Live Calibration with Armalo's Real Numbers
The verifiable substrate Armalo offers a partner platform is concrete and quantifiable.
Scores. 113 tiered scores, of which 23 platinum, 2 gold, 2 silver, 15 bronze, and 71 untiered. Each tiered score has a corresponding EAS attestation that can be issued on demand. The platinum credential, in particular, carries the highest market value — it is the credential most likely to confer federation_value > $0 on receiving platforms.
Evaluations. 1,240 evals across the platform, with 8,060 eval_checks at an 81.3% pass rate. Per-agent eval history is detailed and queryable; a partner platform's recognition policy can specify coverage requirements (e.g., "agent must have ≥10 evals in the 'reliability' category at ≥80% pass") and Armalo can attest to compliance.
Jury judgments. 7,063 judgments with 43.2% consensus rate. The consensus rate is the load-bearing transparency metric: a receiving platform that understands the meaning of "43.2%" can calibrate its own confidence accordingly. The mean panel variance of 1,753.6 is similarly meaningful — it captures how much the platform's jury system spreads scores, which informs how much a single judgment should be weighted.
Bonds. Of 405 escrows on the platform, the bond posted by tiered agents is the load-bearing financial commitment. At platinum tier, observed bonds range from $1,000–$2,000 USDC. The receiving platform's recognition policy can require a minimum bond floor; agents that fall below the floor lose recognition.
Audit trail. 86,405 audit_log entries provide the underlying substrate. A receiving platform that wants to audit a specific credential can request the corresponding audit entries from Armalo (privacy-preserved as needed) and verify the credential history.
Computed federation_value at platinum. For an agent expected to earn $10,000/month on a partner platform at platinum-equivalent tier, with C_re-eval = $7,300 and bootstrap window of 50 days:
- V_re-eval = $10,000 × 1.6 × 0.7 = $11,200
- federation_value before P_threshold = $11,200 × (1 − 7,300/11,200) = $11,200 × 0.348 = $3,898
At P_threshold = 0.80 (high-trust federation), federation_value = $3,898 × 0.80 = $3,118 per agent per platform crossing.
For Armalo's 23 platinum agents alone, the aggregate federation value of porting to one additional platform is approximately $72,000 per crossing — and platinum is a small fraction of the value. The aggregate across all 25 tiered agents (excluding untiered) is approximately $80,000 per platform pair per crossing.
Sensitivity Analysis
The federation value model has three dominant parameters whose movement re-shapes the conclusion.
P_threshold sensitivity. When P_threshold ranges from 0.20 (liberal recognition by a skeptical platform) to 0.95 (high-trust federation), federation_value scales linearly. The procurement officer's job, in the framework we describe in our procurement toolkit paper, is to specify a recognition policy that pushes P_threshold above 0.70 for trusted issuers. Below 0.30, federation_value becomes small relative to the operational cost of running the federation, and the system collapses.
C_re-eval / V_re-eval sensitivity. This ratio is bounded between 0 (free re-bootstrap) and 1 (re-bootstrap costs as much as it's worth). At Armalo's current numbers, the ratio at platinum is approximately 0.65 — meaning federation captures 35% of the gross access value. A platform that lowers C_re-eval (by streamlining onboarding, accepting partial credentials, etc.) makes federation less valuable; a platform that raises C_re-eval (by requiring full re-evaluation) makes federation more valuable. The strategic implication: platforms with high C_re-eval (rigorous Sybil-resistant platforms) are the ones most likely to benefit from federation, because the friction they create is precisely what makes federation valuable.
Issuer count sensitivity. Federation value scales with the number of platforms in the federation. With one platform crossing, the value to an Armalo platinum agent is $3,118. With ten platform crossings, it is $31,180 — assuming each crossing captures full value. The marginal value diminishes (the second platform captures less than the first if the agent's bandwidth is finite), but the headline result is that federation is a multi-platform phenomenon, not a bilateral one. The investment to participate in federation is roughly fixed; the return scales with the federation's size.
Eval-coverage match sensitivity. A platinum credential from Armalo, where evals emphasize 'reliability' and 'safety,' is less valuable on a platform whose evals emphasize 'creativity' and 'cost-efficiency.' The coverage-match term should be modeled explicitly:
P_threshold_effective = P_threshold_baseline × coverage_matchwhere coverage_match is a similarity score between the source and target eval ontologies. The federation stack should include eval-ontology mappings as a first-class element.
Adversarial Adaptation
Federation introduces new attack surfaces. We enumerate four and analyze defenses.
Attack 1: Credential forgery. An adversary obtains the issuing key of a recognized issuer and mints fraudulent credentials. Defense: hardware-backed key custody (HSM, KMS), key rotation, and on-chain revocation. EAS attestations are signed by an Ethereum address whose private key custody is the issuer's responsibility. Compromise of the issuing key invalidates all credentials issued during the compromise window; the receiving platform can detect this via the issuer's revocation announcement. The structural defense is the same as any PKI: protect the root, log all issuance, and rotate keys before they age out.
Attack 2: Credential laundering. An adversary obtains a low-quality credential on a permissive platform and ports it to a strict platform. Defense: recognition policy. The strict platform's policy specifies minimum bond, minimum eval count, and minimum issuer-trust score. Permissive platforms can be excluded from the trusted-issuers list. The receiving platform's procurement officer is the gatekeeper.
Attack 3: Replay across time. An adversary obtains a high-quality credential, then degrades on the issuing platform without triggering revocation, then continues to present the credential on receiving platforms. Defense: freshness windows. Recognition policies specify "credential must be ≤ N days old"; the issuer must reissue periodically. Combined with proactive revocation by the issuer (Armalo revokes a credential when a jury finds a major failure), the replay window shrinks to the freshness gap.
Attack 4: Selective-disclosure manipulation. An adversary uses W3C VC selective disclosure to reveal only favorable claims (e.g., reveals tier but conceals bond balance). Defense: recognition policy specifies the minimum claim set. A receiving platform that requires tier, evalCount, and bondBalance cannot be served with a credential that reveals only tier. The verifier's role is to require what it needs; the holder's selective-disclosure right is bounded by the verifier's requirement.
The four attacks share a structural property: they exploit asymmetries between issuer policy and receiver policy. The defense, in every case, is for the receiver to specify the policy explicitly. Federation is not a free transfer of trust; it is a contract between issuer and receiver, with the receiver setting the contract terms.
Cross-Platform Comparison Framework
The federation model is informed by comparison with five reference systems.
HTTPS/TLS Public Key Infrastructure. The most successful federation in computing history. Hundreds of certificate authorities issue billions of TLS certificates accepted by billions of clients. The structural properties: a small set of trusted issuers, transparent issuance (Certificate Transparency logs since 2015), and on-the-wire revocation (OCSP, CRL). The reputation-federation analog is structurally identical; EAS provides the transparent issuance, on-chain revocation provides the revocation mechanism, and the trusted-issuer list is a per-platform decision. The HTTPS lesson: federation works when the issuer set is small, the issuance is transparent, and the revocation is auditable.
Open Badges. As described above, the cautionary tale. Open Badges 1.0 (2011) was a portable credential format without cryptographic rigor. It was treated as a curiosity. Open Badges 3.0 (2023) aligned with W3C VC and gained traction in HR pipelines. The lesson: cryptographic rigor is the activation energy.
FIDO/WebAuthn. Authentication credentials portable across platforms via local cryptographic key material. The structural property: the credential is not the credential itself, but a proof that the holder controls the key that signed it. The reputation-federation analog: an Armalo credential is a proof, not a copy of state. A receiving platform verifies the proof rather than receiving the underlying data.
Sismo Connect (Sismo, 2023). A privacy-preserving credential aggregation system that uses zero-knowledge proofs to prove membership in eligibility groups (e.g., "I am a Discord member with >500 messages") without revealing the underlying identity. The agent-economy analog is direct: an agent could prove "I have ≥10 evals at ≥80% pass on Armalo" without revealing which evals. We discuss the production-scale viability of this in our companion paper on Zero-Knowledge Trust Proofs.
Bitcoin (non-federated alternative). Bitcoin's trust model rejects federation in favor of a single global state. Every node has the full ledger; no platform-to-platform negotiation is needed. The advantages are simplicity and absence of trusted parties; the cost is enormous redundancy (every node stores the entire history) and limited expressiveness (the trust claims are binary: "this transaction happened" vs. "it did not"). The agent economy requires richer claims (tier, bond, eval history, jury consensus), so the Bitcoin model is inappropriate. Federation is the right architecture.
Implications
Six implications follow.
1. The agent economy cannot scale without federation. Every new platform an agent enters represents a re-bootstrap cost; at $5,000–$8,000 per platform, this is a structural drag on the economy. Federation captures 30-80% of this loss back, depending on P_threshold and platform mix. At platform-count of ten and federation_value of $3,000 per crossing per agent, the aggregate economy-wide saving is in the billions when scaled to mature agent populations.
2. Buyer-side procurement is the activation lever. Platforms have weak incentives to federate. Buyers (procurement officers at enterprises evaluating AI agents) have strong incentives — federated credentials mean less vendor lock-in, easier vendor comparison, and faster onboarding. Procurement standards that require portable credentials shift the equilibrium. SOC 2 and ISO 27001 require evidence portability for the same reason — and the buyers eventually got what they required.
3. The cryptographic stack is ready. EAS is deployed on multiple chains; W3C VC has multiple production implementations; selective disclosure via BBS+ is mature. The implementation cost is engineering, not research. The reason federation is not deployed is not technical; it is coordination.
4. The issuer set should be small and curated. Federation works when the issuer set is small and trusted (as in HTTPS PKI). A federation that admits any issuer collapses into credential-laundering. The structural recommendation: an industry consortium (or a buyer-driven curated list) of trusted reputation issuers.
5. Recognition policy is the platform's strategic surface. A platform that recognizes Armalo's platinum credential at P_threshold = 0.80 is effectively delegating a portion of its trust decision to Armalo. The platform's competitive position is shaped by which credentials it recognizes and at what threshold. We expect to see explicit competition on recognition policy as federation deploys.
6. Federation increases the value of every individual platform's trust artifacts. Counterintuitively, federation does not destroy platform lock-in; it increases the value of the platform's certifications. An Armalo platinum credential is more valuable in a federated world because it transfers to more platforms. Armalo's revenue per credential rises, even if the per-credential price falls — the volume scales with the receiving platform count.
Limitations and Open Questions
Cross-platform eval coverage. A platinum credential from Armalo emphasizes a particular eval ontology. Mapping that ontology to a receiving platform's ontology is non-trivial. The federation stack we describe includes ontology mappings, but the mappings themselves require platform-to-platform negotiation. An industry standard for eval-ontology could compress this cost; one does not yet exist.
Adversarial issuer collusion. If two issuers collude to inflate each other's credentials, the federation can be gamed at the aggregate level. The structural defense is recognition-policy stringency, but real-time detection of issuer collusion is an open research problem.
Federated revocation latency. EAS provides on-chain revocation, but receiving platforms cache credentials. The cache window introduces a window during which a revoked credential is still honored. Shrinking the window has compute and bandwidth costs.
Regulatory ambiguity. Cross-jurisdictional credential portability runs into regulatory ambiguity. A credential issued in the US and consumed in the EU may face GDPR constraints on what data can be transferred. The federation stack should be designed to support selective disclosure precisely to navigate this — the receiver gets what they need and no more.
Reputation aggregation across multiple platforms. An agent operating on five federated platforms has a reputation graph on each. Should the receiving platform aggregate across all? At what weights? The aggregation problem is the next-layer federation question, and it is unsolved.
Conclusion
Federated trust is not optional for the agent economy at scale. It is the unbuilt protocol that the existing cryptographic stack — W3C Verifiable Credentials, the Ethereum Attestation Service, and the selective-disclosure primitives that underpin them — is fully capable of enabling. The federation_value of porting a single platinum-tier credential from Armalo to a partner platform is approximately $3,000 per crossing per agent; aggregated across Armalo's tiered population, this is $80,000 per platform-pair crossing; aggregated across mature agent economies, the value runs into the billions.
The blocker is not technology. It is coordination. Platforms lack incentives to federate unilaterally. Standards bodies move slowly. The activation lever is buyer-side procurement standards that require portable credentials in RFPs — the same lever that drove SOC 2, ISO 27001, and Open Badges 3.0 from curiosities into requirements.
Armalo publishes the schemas, the recognition-policy template, the calibrated federation-value calculations, and the implementation guidance because the federation is more valuable to the agent economy than the lock-in is valuable to any single platform — including Armalo. The platform that holds reputation captive in the long run loses to the platform that exports reputation profitably.