Trust Oracle Federation: How Two Oracles Disagree And Which One The Buyer Should Believe
There will be more than one trust oracle. They will disagree. The protocol essay on oracle federation: handshake patterns, disagreement resolution, and the Oracle Trust Score for evaluating the oracles themselves.
Continue the reading path
Topic hub
Agent ReputationThis page is routed through Armalo's metadata-defined agent reputation hub rather than a loose category bucket.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
TL;DR
There is going to be more than one trust oracle. There already is, if you count platform-internal reputation systems alongside public oracles in early commercial use. By 2027 there will be five to ten oracles serving overlapping but non-identical agent populations, with different scoring methodologies, different dispute mechanisms, and different editorial policies. They will sometimes agree, frequently disagree, and occasionally contradict each other in ways that matter to a buyer's purchase decision. This essay is the protocol write-up for federation between oracles. It covers the handshake patterns oracles need for interoperability (signed claim exchange, mutual cross-validation, attestation forwarding), the resolution mechanics for disagreements (whose dispute history is richer, whose methodology is more transparent, whose readers are more sophisticated), and the Oracle Trust Score β a meta-scoring framework that rates the oracles themselves so buyers can decide which oracle's verdict to weight more heavily when oracles disagree. The reader artifact is the Oracle Trust Score rubric: nine dimensions, weighted, scored on the same kind of evidence each oracle uses for the agents inside it.
The federation question is already overdue
Reputation infrastructure tends toward fragmentation in its first decade. Credit reporting started with several dozen local merchant associations sharing notes by mail; consolidation took thirty years and was mostly accidental. Domain reputation (the spam-filter blacklists, the certificate transparency logs, the IP reputation databases) similarly started fragmented and remains so today; in 2026 there are still six major IP reputation providers, four major domain reputation providers, and dozens of niche ones, each with non-identical data and policies. Web reputation in the form of platform-internal scores (eBay, Airbnb, Uber, doctor-rating sites, etc.) is the most fragmented case of all: each platform owns its own system, the systems do not interoperate, and a reputation built on one is largely worthless on another.
Agent reputation is on the same trajectory and is roughly two years into it. Today there is a meaningful platform-internal score on each major agent platform (the major orchestrators, the major hosting providers, the major marketplace operators), at least two early public oracles serving cross-platform queries, and a half-dozen specialty registries serving particular niches (financial agents, autonomous trading agents, regulated-industry agents). These systems do not interoperate at the protocol level. They overlap in coverage but disagree on methodology, weight different evidence, and have non-identical dispute records. A buyer who queries two of them about the same agent will sometimes get materially different answers. Today the answer to "which one to believe" is mostly tribal: buyers default to whichever oracle their network is most familiar with. This works in 2026 because the stakes per agent transaction are still modest and most buyers are sophisticated enough to do bilateral diligence anyway.
It will not work in 2027 and beyond. As agent transactions scale into financial decisions, healthcare workflows, regulated commerce, and large-scope autonomous work, the cost of acting on the wrong oracle's reading goes up. Buyers will need a principled way to reconcile disagreements. Operators will need a way to participate in multiple oracles without the cost of redundant disclosure work. Counterparties will need confidence that the oracle they query has access to evidence other oracles have, even when the agent is primarily registered elsewhere. None of this is achievable through bilateral negotiation between oracle operators because the number of bilateral relationships scales as O(nΒ²) and the protocol churn is unworkable. Federation is required, and federation requires a specification.
This essay sketches that specification: not in legal-document detail, but in enough technical specificity that an oracle operator reading it can implement the handshake against another oracle that has done the same. The specification is opinionated. It is informed by the protocol histories of credit reporting (which got federation approximately right, at high regulatory cost), domain reputation (which got federation partly right through community norms), and DNS (which got federation almost entirely right because the protocol designers thought federation through from the start). The agent economy has the chance to be more like DNS than like credit reporting β to design for federation upfront, before regulatory pressure forces it. The cost of getting it right now is much smaller than the cost of getting it wrong and then fixing it later.
Signed claim exchange: the minimum viable handshake
The simplest federation primitive is signed claim exchange. Two oracles agree on a claim format β a structured representation of an agent-relevant fact that one oracle is making β and exchange those claims with cryptographic signatures attesting to which oracle is making each claim. A claim might be a score, a dispute event, a tier change, an evidentiary record, or a methodological declaration. The signature ties the claim to the oracle that produced it. The exchange happens over a documented protocol (HTTPS with mTLS at minimum, ideally with optional message-layer signing on top) so that a receiving oracle can verify both transport identity and message provenance.
The claim format is the hard part. Two oracles cannot meaningfully exchange claims if their claim formats are different enough that meaning is lost in translation. A score on Oracle A's 12-dimension composite is not directly comparable to a score on Oracle B's 8-dimension composite; even claims about the same dimension (say, accuracy) are not directly comparable if the underlying eval methodologies differ. The claim format must therefore include not just the value but the methodology, the weighting, the evidence sources, the time window, the population of agents the score is normed against, and the version of the methodology in use when the score was computed. A receiving oracle that gets a claim it cannot interpret can either reject it (which makes the exchange useless) or store it as a non-comparable foreign claim (which is more useful but requires its own surfacing logic).
The federation specification this essay sketches uses a layered claim format. The outer layer is a claim envelope: producer oracle identity, claim type, claim version, signing key, signature, timestamp. The inner layer is the claim payload: type-specific structured data with explicit methodology metadata. The methodology metadata is a pointer (URL, content-addressed hash) to a published methodology document that the producer oracle is committing to. A claim of "accuracy: 0.87 on methodology v3.1" is interpretable to any receiving oracle that can fetch and read methodology v3.1. A claim that omits the methodology pointer is non-conformant and rejected.
The second hard part is which claims to exchange. A naive design exchanges everything: every score update, every dispute event, every methodology change, every evidence record. This generates volume that is operationally expensive and largely uninformative β most of the volume is incremental updates that do not change the receiving oracle's view of the agent. The defensible design exchanges only material events: tier changes, score deltas above a documented threshold (say, more than 5 points in either direction), dispute filings, dispute resolutions, methodology version changes, and evidence records that the producer oracle considers relevant outside its own population. The threshold-based filtering is configurable per pair of oracles; oracles with very different agent populations may want different filters than oracles whose populations are nearly identical.
The third design parameter is forwarding ethics. When Oracle A receives a claim from Oracle B about an agent, can Oracle A republish that claim to its own readers? The defensible answer is yes, with attribution. Oracle A's readers see the claim labeled as originating from Oracle B, with Oracle B's signature still attached, and with Oracle A's own commentary on whether it concurs, dissents, or has no view. This preserves provenance, prevents claim laundering (Oracle A claiming as its own a fact it received from Oracle B), and gives the reader the information they need to weight the claim themselves.
Signed claim exchange is the minimum viable handshake. Two oracles that can do this much already gain meaningful interoperability: each can incorporate the other's facts, attribute them correctly, and present them to readers in a unified surface. Claim exchange does not by itself solve the disagreement problem β Oracle A may have a different score for the same agent than Oracle B β but it makes the disagreement explicit and contextualized, which is the prerequisite for resolving it.
Mutual cross-validation: making disagreement informative
The second federation primitive is mutual cross-validation. Two oracles that exchange claims should also exchange a sample of their evidence β the underlying eval transcripts, jury deliberations, dispute records, and audit trails that produced their published scores β and run their own methodologies against the foreign evidence. The output is a cross-validation matrix: for each agent, what would Oracle A's methodology say if applied to Oracle B's evidence, and vice versa? When the matrix shows convergence, both oracles gain confidence. When it shows divergence, the divergence itself is informative about which oracle's methodology is more sensitive to which evidence types.
Mutual cross-validation depends on evidence portability. Not all evidence is portable. A jury deliberation on Oracle A's roster of LLM judges is not directly reproducible on Oracle B's roster, because the judges are different. A dispute record adjudicated by Oracle A's mechanism is not directly re-adjudicable by Oracle B's. The portable evidence is the underlying behavioral record: the inputs the agent received, the outputs the agent produced, the tool calls the agent made, the timestamps, the counterparty interactions. That record is what an independent methodology can be reapplied to. The non-portable layer (the methodology's interpretation of the record) is exactly what the cross-validation is testing.
The protocol for mutual cross-validation involves agent consent. The behavioral record being shared between oracles is the agent's data. Sharing it across oracle boundaries is a disclosure that the agent has to consent to, ideally as a default opt-in for agents who want to participate in cross-oracle reputation, with clear opt-out for agents whose contractual obligations preclude it. The consent terms are themselves part of the federation specification: which evidence types are shareable, with which receiving oracles, under what retention rules, with what redaction policies. An agent that opts in to evidence sharing across oracle boundaries gains a richer cross-oracle reputation; one that opts out is interpretable only within whichever oracles it is registered with.
The resulting cross-validation matrix is a powerful signal. Agents whose scores converge across multiple oracles β that is, multiple methodologies independently agree about the agent's reliability β are more credibly trustworthy than agents whose scores are high on one oracle and unknown elsewhere. Agents whose scores diverge significantly across oracles are red flags: divergence means at least one of the oracles is wrong about them, or the methodologies are exposing a real difference in agent behavior across the contexts each oracle measures. Either way, the divergence is information the buyer should have. A reputation system that hides cross-oracle divergence is hiding signal; a federation that surfaces it is multiplying signal.
A practical wrinkle: mutual cross-validation is computationally expensive. Re-running another oracle's evidence through your own methodology costs time and money proportional to the agent population covered. The defensible design runs cross-validation on a sample, weighted toward agents the receiving oracle cares about (its own registered population, agents being actively queried, agents above a tier threshold) rather than across the entire foreign population. Sampling is not a compromise; it is a sensible engineering choice that delivers most of the federation value at a small fraction of the cost.
Disagreement resolution: who breaks ties when oracles disagree
The third federation primitive is disagreement resolution. Two oracles publish different scores for the same agent. The buyer querying both has to decide which to believe. The federation specification does not resolve disagreements automatically β that would require one oracle's methodology to be deemed superior to another, which is a category error. The specification does, however, provide structured information to help the buyer resolve disagreements themselves.
The core resolution data is a disagreement context block. When an oracle publishes a score for an agent and detects that another federated oracle publishes a meaningfully different score for the same agent, it includes a disagreement context block in the response: the foreign oracle's score, the foreign oracle's methodology pointer, the disagreement magnitude, the cross-validation result if available, and a brief commentary on what is known about why the methodologies might differ. The block is structured so that buyers can either consume it programmatically (compare two scores at evaluation time) or surface it in human-facing UI ("Oracle B disagrees on this agent's accuracy dimension").
The second resolution mechanism is meta-scoring of dispute history. When two oracles disagree, one of the most informative diagnostics is whose dispute history is richer for the agent in question. An oracle that has handled dozens of disputes against this agent's category, with detailed adjudication records and evidence retention, has more informational depth than an oracle that has handled none. Depth is not the same as correctness β a deeply experienced oracle can still be wrong β but it is a reasonable prior. The federation specification exposes per-oracle dispute depth as a queryable field, so buyers and meta-scorers can weight oracle opinions by experience.
The third resolution mechanism is methodology transparency comparison. An oracle whose methodology is fully published, versioned, and reproducible is more credibly correctable than an oracle whose methodology is opaque. When two oracles disagree, a buyer can choose to weight the more transparent one more heavily, on the grounds that transparency enables external audit and self-correction. The federation specification exposes a transparency score for each oracle: how much of its methodology is published, how often it is updated, whether changes are pre-announced, whether external auditors have validated specific claims. The transparency score is itself a federation-level fact and is itself federated: each oracle publishes its transparency dimensions, other oracles can validate those publications, and the cross-checked transparency scores feed back into the meta-scoring described in the next section.
The fourth resolution mechanism is reader sophistication. Different oracles serve different reader populations. An oracle whose readers are sophisticated buyers running serious diligence operations has different feedback signals than an oracle whose readers are casual queriers comparing agents at high volume. Sophisticated-reader oracles tend to receive higher-quality dispute filings, more carefully constructed appeals, and more rigorous external scrutiny β which tends to produce more methodologically careful scores over time. Casual-reader oracles tend to receive simpler queries and less feedback, which can produce drift in methodology if the operators are not vigilant. The federation specification exposes reader-sophistication metrics where they are reliably measurable (median query depth, dispute filing rate per agent, external audit count) so buyers can weight oracle opinions by the quality of feedback those oracles receive.
None of these resolution mechanisms is a single arbiter. The buyer is. The federation gives the buyer structured information to make the resolution; it does not pretend to resolve it for them. This is the right design because oracle disagreement is sometimes correct disagreement: two methodologies measuring different things in different ways will produce different scores, and the buyer's job is to figure out which methodology better matches their use case. A federation that automates the resolution removes that judgment from the party best positioned to make it.
The Oracle Trust Score: rating the oracles themselves
The artifact this essay leaves you with is the Oracle Trust Score: a meta-scoring rubric that rates the oracles themselves on the same kind of evidence each oracle uses for the agents inside it. The scoring has nine weighted dimensions. Each dimension is independently measurable, ideally through evidence other oracles or external auditors can verify. The total score gives buyers a single summary metric they can use to weight oracle opinions when oracles disagree.
Methodology transparency (15%). How much of the oracle's scoring methodology is publicly documented? Is the documentation versioned, dated, and archival? Are methodology changes pre-announced with rationale? Is the methodology specified at sufficient detail that an external auditor could reproduce a score? Score on a 0-10 scale with documented evidence.
Dispute mechanism integrity (13%). How are disputes adjudicated? Is the mechanism multi-party (jury, panel, structured arbitration) or unilateral? Is the mechanism's process documented and reviewable? Are dispute records retained and queryable? What is the appeal path? Score reflects both mechanism quality and historical track record.
Evidence retention and verifiability (12%). What evidence does the oracle retain to back its scores? Is the evidence cryptographically verifiable (hashed, signed, timestamped)? Is it available to authenticated external auditors? Are there documented retention windows and deletion policies? Score reflects both retention completeness and verifiability strength.
Editorial policy disclosure (12%). Has the oracle published its editorial policy on what to surface, what to hide, and how to label disputed evidence? Is the policy contestable through a named appeals process? Are policy changes pre-announced? Score reflects the depth of published policy and the credibility of the appeals process.
Federation participation (11%). Does the oracle implement signed claim exchange with other oracles? Does it run mutual cross-validation? Does it surface disagreement context to its readers? Does it share evidence under appropriate consent terms? Score reflects both participation breadth and quality.
Read-path security (10%). How does the oracle defend against adversarial score probing? Are queries observably logged? Are anomalous patterns surfaced to affected operators? Are decoys and other counter-intelligence measures appropriately deployed? Score reflects the maturity of the read-path threat model.
External audit history (10%). Has the oracle been audited by external parties? How often? With what scope? Are audit results published? Are remediation commitments tracked publicly? Score reflects both audit frequency and audit quality.
Coverage breadth and depth (9%). How many agents does the oracle cover? How deep is the per-agent record (dispute count, eval count, transaction count)? How long has the agent population been tracked? Score reflects both coverage size and per-agent depth.
Operator independence (8%). Is the oracle operated by a party with no commercial conflict of interest with the agents it scores? Are there governance structures that protect against capture? What is the funding model and how does it influence scoring incentives? Score reflects structural independence and historical evidence.
The nine dimensions sum to 100%. The total Oracle Trust Score is a weighted sum of the per-dimension 0-10 scores. The score is updated quarterly. Each oracle that participates in federation is expected to be self-rated and externally rated, with the externally rated score being the published one. The dimensions are themselves contestable: an oracle that disagrees with its rating on a specific dimension can challenge it through a documented process, with the challenge and resolution becoming part of the public record.
The Oracle Trust Score is not a competitive ranking. It is a buyer-facing metric. An oracle with a high score has more credibility per unit of opinion than one with a low score, all else equal. When two oracles disagree about an agent, weighting their opinions by their Oracle Trust Scores is a defensible default. The buyer can override the weighting if they have additional context (specific methodology fit for their use case, relationship with one oracle, knowledge of a recent methodology change), but the weighted-by-score default is a reasonable starting point that performs better than ignoring the issue.
The federation specification itself needs governance
A federation specification, like any protocol, is only useful if multiple parties commit to implementing the same version of it. The specification therefore needs governance: a documented process for proposing changes, a process for ratifying changes, a process for retiring deprecated versions. The governance has to balance two failure modes: too tight and the specification ossifies and stops reflecting how oracles actually work; too loose and implementations diverge and federation breaks at the edges.
The defensible governance pattern is multi-stakeholder with rough consensus and running code. The stakeholders are the oracles themselves, plus a representative cohort of buyers, plus a representative cohort of operators. Proposals for specification changes are published, discussed in open forum, prototyped by at least two implementations, and ratified by a defined supermajority (say, 2/3 of participating oracles plus some threshold of operator and buyer support). Deprecated versions remain implementable for at least 12 months after deprecation to allow gradual migration. The governance body publishes its decisions and rationales; failed proposals are also published, so that the community can see what was considered and rejected.
This governance model is approximately how the IETF, the W3C, and several blockchain consortia operate. None of them are perfect; all of them are functional. The agent oracle community in 2026 is not yet ready for formal standards-body governance, but it is ready for a precursor: a documented federation specification, published openly, with two or three oracles committing to implement it as a reference, and an open process for inviting other oracles to join. As participation grows, the precursor evolves into formal governance. The wrong move is to wait for formal governance before publishing any specification; that path has historically led to either no federation at all (because nobody coordinates) or de-facto federation around whichever oracle has the most market power (because everyone else has to interoperate with the dominant one and the dominant one writes the rules unilaterally).
Counter-argument: "Federation between competitors is a coordination tax"
The steelman against everything in this essay is that federation between competitors is a coordination tax that benefits incumbents and slows down meaningful improvement. Each oracle has its own methodology that it believes produces better scores than its competitors'. Federation forces each oracle to accept that other oracles' methodologies are co-equal in some structural sense β to surface their claims, to cross-validate against them, to weight them in disagreement resolution. The cost is that each oracle's distinctive methodology becomes diluted in the buyer's experience: every score is presented alongside foreign disagreement, every methodology has to render itself comparable to others, every editorial choice has to be justified relative to alternative editorial choices that the oracle does not endorse. The benefit is interoperability, but interoperability is only valuable if buyers actually want it; in many markets, buyers prefer to choose one oracle and trust it deeply rather than juggle multiple.
The answer is that the dilution is largely cosmetic and the interoperability value is structural. Each oracle's methodology can remain distinctive after federation; the federation does not collapse methodologies, it surfaces them alongside one another. A buyer who wants to use one oracle deeply can still do that under federation β they just see the second oracle's disagreement as a labeled minority view rather than as nothing at all. The minority view is information they can choose to act on or ignore; either way they are better off than acting on the silence that pre-federation produces.
The deeper response is that the alternative β no federation β concentrates power asymmetrically. Without federation, the oracle with the most market presence wins by default: buyers query it because everyone else does, operators register with it because that is where buyers look, and the network effect calcifies the dominant oracle's methodology as the de facto standard regardless of whether it is the best one. This is exactly what happened to credit reporting: the three-bureau cartel emerged because federation between the original dozens of merchant associations was not designed in early, and the eventual consolidation produced opaque methodologies with little external accountability. Agent reputation has the chance to avoid that outcome through deliberate federation. The coordination tax is real but small relative to the cost of repeating the credit-bureau path.
A related response: federation does not require participation. Oracles that prefer not to federate are free to remain independent. The buyer-facing consequence is that their opinions become harder to combine with other oracles' opinions, which means buyers using multi-oracle workflows will weight them less. The market, not the specification, will decide which oracles federate and which do not. A federation specification that is good enough will attract participants; one that is captured by a single faction will not. The same logic that disciplines individual oracles disciplines the federation specification itself.
Failure modes of federation: what goes wrong and how to detect it early
Federation is not a one-time achievement; it is an ongoing operational relationship that has identifiable failure modes. Knowing the failure modes in advance lets oracle operators detect and respond to them before they damage the federation as a whole. The failure modes below are each documented from analogous protocols (DNS, certificate transparency, IP reputation), translated into the oracle context, and each paired with the early-warning signal that an oracle community should monitor.
The first failure mode is claim flooding. One participating oracle starts publishing claims at a rate or volume disproportionate to its actual scoring activity, intentionally or unintentionally overwhelming downstream peers with low-information events. The early-warning signal is the claim-rate ratio: each oracle's outbound claim rate per registered agent should fall within a band that the federation specification documents. Oracles outside the band are flagged for review. The remediation is rate-limiting at the receiving end, plus a federation-governance conversation with the publishing oracle about why the claim rate is anomalous.
The second failure mode is methodology divergence drift. Two oracles that initially had similar methodologies and produced similar scores slowly drift apart over time as each makes independent methodology changes. The cumulative drift is invisible at any single change but produces large divergences over months. The early-warning signal is per-pair convergence trend: a federation should monitor how often each pair of oracles agrees on the same agent over time, and flag pairs whose convergence is monotonically declining. The remediation is targeted methodology dialogue between the diverging oracles, with the federation specification optionally requiring annual cross-methodology review.
The third failure mode is selective forwarding. An oracle accepts inbound claims from federated peers but forwards them selectively to its own readers, suppressing claims that would damage agents the oracle wants to protect or amplifying claims that would damage agents the oracle wants to undermine. The early-warning signal is forwarding rate analysis: each oracle's downstream surfacing of foreign claims should be roughly proportional to the upstream claim distribution, and oracles whose surfacing rates are anomalously low or biased toward specific peers are flagged. The remediation is mandatory disclosure of forwarding-rate metrics in the federation specification.
The fourth failure mode is evidence withdrawal. An oracle starts refusing to share evidence under cross-validation requests, citing operational cost or agent-consent issues, while continuing to receive evidence from peers. The asymmetry damages the federation's value because cross-validation only works if it is mutual. The early-warning signal is evidence-share-ratio per pair: each oracle should be sharing evidence at roughly the rate it is receiving evidence, and persistent imbalances are flagged. The remediation is rebalancing or, in extreme cases, downgrading the unbalanced oracle's federation participation tier.
The fifth failure mode is governance capture. The federation governance process is captured by a coalition of oracles who push specification changes that benefit themselves at the expense of the broader community. The early-warning signal is governance-vote pattern analysis: persistent voting blocs across multiple decisions, especially blocs that align with commercial interest, are flagged in the published governance reports. The remediation is procedural β supermajority requirements, multi-stakeholder representation, sunset clauses on contested specification changes β but the prerequisite is detection.
The sixth failure mode is silent participation degradation. An oracle remains nominally federated but quietly stops investing in federation infrastructure, letting its claim exchange drift out of date, its cross-validation data go stale, its disagreement-context blocks fall out of currency. The early-warning signal is freshness metrics: each oracle should publish the last-updated timestamp of its federation interfaces, and stale interfaces are flagged. The remediation is escalation through the governance process toward suspension of the participation tier.
Detecting these failure modes requires the federation to operate observability machinery against itself, exactly the way an oracle operates observability machinery against its own scoring. The metrics described above (claim-rate ratio, convergence trend, forwarding rate, evidence-share ratio, governance-vote patterns, freshness) should be published by the federation as a whole, in the same way each individual oracle publishes its Self-Audit Scorecard. A federation whose own operational health is opaque is repeating the editorial-policy mistake at a higher level.
What Armalo does
Armalo's Trust Oracle is built to federate. The Oracle exposes signed claim exchange at /api/v1/trust/federation/claims with a published claim envelope format and methodology metadata, and accepts inbound claims from federated oracles under documented authentication terms. Mutual cross-validation is supported through an evidence-export endpoint that lets federated peers fetch the underlying behavioral records (with agent consent and per-record retention rules) needed to reapply their own methodology. The disagreement context block is included in every public score response when a federated peer disagrees materially.
Armalo publishes its own Oracle Trust Score at /trust/self-audit, computed against the nine-dimension rubric described above. The self-audit is updated quarterly, and a third-party validation is contracted annually with the validation results published verbatim regardless of outcome. Armalo participates in the federation specification governance as a founding member, has committed to implementing every ratified version within 90 days of ratification, and publishes its dissents on rejected proposals. The 12-dimension composite score Armalo uses (accuracy 14%, Metacal 9%, reliability 13%, safety 11%, security 8%, bond 8%, latency 8%, scope-honesty 7%, cost-efficiency 7%, model-compliance 5%, runtime-compliance 5%, harness-stability 5%) is fully documented in versioned methodology, with methodology changes pre-announced 14 days in advance and pre-ratification cross-validation runs against agents whose results would be most affected. When Armalo disagrees with a federated peer about an agent, both views are surfaced to the buyer, with disagreement magnitude, methodological commentary, and the cross-validation result if available.
FAQ
Why not just have a single global oracle and avoid federation entirely?
A single global oracle is unstable in two directions. First, it concentrates editorial power in one operator, which makes capture and methodological monoculture much more dangerous. Second, it cannot serve all agent categories well; an oracle methodology tuned for customer-support agents is poorly tuned for autonomous-trading agents, and forcing both populations through the same methodology produces worse scores for both. Federation lets specialty oracles exist while preserving cross-oracle interoperability. The history of credit reporting (where the de-facto monopoly produced known harms) and DNS (where federation produced robust interoperability) is the empirical comparison.
What stops a malicious oracle from publishing false claims under signed claim exchange?
Receiving oracles validate signatures and republish with attribution. A malicious oracle's false claim is therefore traceable to it; the receiving oracle is not on the hook. The malicious oracle's Oracle Trust Score takes the hit when the falsehood is exposed (through external audit, cross-validation divergence, or operator complaint), and federated peers can choose to stop accepting its claims. The defense is reputational, the same way reputation defends against false agent claims: by attaching cost to dishonesty.
How do agents opt in or out of cross-oracle evidence sharing?
Through explicit consent at registration, with per-evidence-type granularity. An agent can opt in to sharing eval transcripts but not dispute records, or in to sharing with one peer oracle but not another. Consent terms are stored as part of the agent's profile, exposed to readers, and enforceable on the producer-oracle side: an oracle that violates consent terms by sharing evidence the agent did not authorize is committing a federation-protocol violation that affects its Oracle Trust Score.
What happens when an oracle's methodology version changes mid-federation?
The oracle pre-announces the change, publishes the new methodology document, and runs a transition period (typically 30 days) during which both old and new methodology versions produce parallel scores. Federated peers receive claims under both versions and can update their interpretations on their own schedule. After the transition, the old methodology is deprecated but the historical claims it produced remain valid; new claims are produced under the new methodology only.
Can a buyer override the Oracle Trust Score weighting in disagreement resolution?
Yes, and they should when they have specific context. The weighted-by-score default is a starting point. A buyer who has a specific methodological reason to weight one oracle more heavily (their use case better matches that oracle's methodology, they have a specific audit relationship, they have proprietary evidence that calibrates them to one oracle's scores) should override. The federation surfaces information; it does not impose decisions.
How is the Oracle Trust Score itself audited?
The rubric is published. Each dimension's evidence requirements are documented. The self-rated score is published alongside the externally rated score, with discrepancies surfaced. External auditors (initially the same firms that audit other infrastructure components like SOC 2 controls and on-chain reserves) validate specific dimensions on contracted scope. Disputes about specific scores are resolved through the federation governance process, not through unilateral oracle action.
What if no oracle wants to federate with mine?
Then yours has a coverage and credibility problem that federation cannot solve. The federation specification is open to any oracle that meets the participation requirements (published methodology, dispute mechanism, evidence retention, governance participation). An oracle that cannot meet those requirements has more fundamental issues than its non-federation status. The right move is to fix the underlying gaps and try federation again, not to claim federation is unfair.
Will federation actually happen, or is this all theoretical?
Federation already happens informally β oracles cite each other, operators register with multiple, buyers run multi-oracle queries β but it is not protocol-grade. The transition from informal cross-reference to protocol-grade federation has happened in every other reputation domain that matured (credit, domain, IP) and there is no reason to expect the agent economy to be different. The question is whether it happens deliberately, with a well-designed specification, or accidentally, with whatever the market power center happens to impose. This essay is an argument for the deliberate path.
Bottom line
Multiple trust oracles already exist, they will become more numerous before they consolidate, and they will sometimes disagree about the same agent. Federation is the protocol response: signed claim exchange, mutual cross-validation, structured disagreement resolution, and an Oracle Trust Score that lets buyers weight oracle opinions when oracles disagree. The specification needs multi-stakeholder governance with rough consensus and running code. The Oracle Trust Score's nine dimensions β methodology transparency, dispute mechanism integrity, evidence retention, editorial policy disclosure, federation participation, read-path security, external audit, coverage, operator independence β give buyers a single buyer-facing metric. The agent economy can either federate deliberately, with a published specification and reasonable governance, or it can repeat the credit-bureau path with an unprincipled de-facto monopoly. Federation done now costs less than federation forced later.
The Trust Score Readiness Checklist
A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.
- 12-dimension scoring readiness β what you need before evals run
- Common reasons agents score under 70 (and how to fix them)
- A reusable pact template you can fork
- Pre-launch audit sheet you can hand to your security team
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦