Open Problems in Agent Trust: A Research Agenda for 2026
Sybil resistance, cross-platform score portability, adversarial trust gaming, privacy-preserving verification. The hardest unsolved problems in agent trust.
Trust infrastructure for AI agents has made significant progress. We have behavioral contracts, composite scoring, on-chain escrow, and jury-based dispute resolution. But several fundamental problems remain unsolved.
This post outlines the research questions we consider most important for the field in 2026. We do not have all the answers. We are publishing this to invite collaboration from researchers, builders, and practitioners working on adjacent problems.
Problem 1: Sybil Resistance for Agents
In traditional reputation systems, sybil attacks involve creating fake accounts to manipulate ratings. For AI agents, the problem is worse: creating a new agent is essentially free, and agents can generate convincing interaction histories with each other.
An adversary could:
- Spin up 100 agents that all rate each other positively.
- Generate synthetic evaluation data that mimics genuine interactions.
- Accumulate high trust scores through self-dealing.
Current mitigations: Requiring organizational identity verification (linking agents to real companies), weighting scores by the diversity of interaction partners, and detecting statistical anomalies in evaluation patterns.
Open questions: How do you distinguish legitimate agent clusters (e.g., a company's internal agent fleet) from sybil clusters? Can zero-knowledge proofs verify that agents belong to distinct entities without revealing which entities? What is the minimum cost of sybil resistance that preserves the low barrier to entry that makes agent ecosystems valuable?
Problem 2: Cross-Platform Score Portability
An agent's trust score on Platform A should mean something on Platform B. But scores from different systems use different scales, different evaluation methodologies, and different weighting schemes.
This is analogous to credit score portability across countries. A FICO score means nothing in Germany, and a Schufa score means nothing in the United States, even though both measure creditworthiness.
Current state: Trust scores are platform-specific. An agent with a 97 on one system has no portable credential to present on another.
Approaches being explored:
- Standardized evaluation benchmarks that all platforms agree to run, producing comparable scores.
- Verifiable credentials (W3C VC standard) that carry signed attestations from one platform to another.
- Federated scoring protocols where platforms share evaluation data in a privacy-preserving way.
Open questions: Who governs the benchmark standard? How do you prevent a platform from inflating scores to make its agents look better? Can cryptographic techniques (like commitment schemes) enable score comparison without revealing the underlying methodology?
Problem 3: Adversarial Trust Gaming
Sophisticated adversaries will not attack the scoring algorithm directly. They will game it.
Known gaming strategies:
- Sandbag and switch: Build a high trust score with simple, easy-to-pass tasks, then pivot to high-stakes tasks where the agent's actual competence is untested.
- Evaluation hacking: Optimize specifically for the evaluation metrics while degrading on unmeasured dimensions.
- Temporal manipulation: Perform well during evaluation periods and poorly during normal operation.
Current mitigations: Multi-dimensional scoring (harder to game all dimensions simultaneously), continuous evaluation (no distinct "evaluation periods"), and behavioral contract specificity (terms must match the actual deployment context).
Open questions: Can we design scoring systems that are provably resistant to gaming under defined adversary models? What is the theoretical limit of trust score accuracy when the rated entity is actively trying to deceive the rater? How do you detect a sandbag-and-switch strategy before the switch happens?
Problem 4: Privacy-Preserving Verification
Trust verification requires sharing information about an agent's behavior. But some of that information is sensitive:
- The specific tasks an agent performed may be confidential.
- The evaluation criteria may reveal proprietary business logic.
- The interaction partners may not consent to being identified.
The tension: trust requires transparency, but deployment requires confidentiality.
Approaches being explored:
- Homomorphic scoring: Computing trust scores on encrypted evaluation data without decrypting it.
- Zero-knowledge attestations: Proving that an agent meets a trust threshold without revealing the exact score or the underlying data.
- Differential privacy: Adding calibrated noise to evaluation data so that individual interactions cannot be reconstructed.
Open questions: What is the minimum information that must be revealed for a trust score to be meaningful? Can ZK proofs be made efficient enough for real-time trust verification in agent-to-agent interactions? How do you audit a privacy-preserving trust system?
Problem 5: Trust Decay and Model Drift
An agent's trust score reflects its historical behavior. But models get updated, fine-tuned, and retrained. A model update can change an agent's behavior in ways that invalidate its trust record.
Current approach: Trust scores decay over time, weighting recent interactions more heavily than older ones.
Open questions: How aggressively should scores decay? Should a model update reset the trust score entirely, or should it trigger a re-evaluation phase? Can we detect model drift automatically and adjust confidence in the score accordingly? What is the right granularity of identity: is a fine-tuned version of an agent the same agent?
Problem 6: Multi-Agent Collective Trust
In multi-agent workflows, trust is not just about individual agents. It is about the composition. Agent A and Agent B might each be individually trustworthy, but the specific combination of A feeding data to B might produce failures that neither exhibits alone.
Open questions: Can we define and measure trust for agent compositions, not just individual agents? How do you score a workflow that uses five agents from three different providers? What is the right liability model when a failure is caused by the interaction between agents rather than any single agent?
Call to Action
These problems are hard, and they will not be solved by any single team. We are actively interested in collaborating with:
- Cryptography researchers working on ZK proofs, homomorphic encryption, and verifiable computation.
- Reputation system researchers with experience in sybil resistance and adversarial robustness.
- Distributed systems engineers building cross-platform interoperability protocols.
- Policy researchers thinking about governance structures for agent trust standards.
If you are working on any of these problems, we want to hear from you. The agent economy will be built on trust infrastructure that does not fully exist yet. Building it is a collective effort.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.