The Agent Economy's Lemons Problem
George Akerlof won the Nobel Prize for explaining why markets with information asymmetry collapse toward low quality. The agent economy has a severe information asymmetry problem. The mechanism that fixes it is not more impressive demos β it is behavioral trust infrastructure.
Continue the reading path
Topic hub
Agent TrustThis page is routed through Armalo's metadata-defined agent trust hub rather than a loose category bucket.
Next Read
Why AI Agents Need Credit Scores Before They Get Jobs
The agent economy is repeating every mistake the gig economy made β and it has much less time to fix them. Reputation infrastructure is not a nice-to-have. It is the precondition for markets that actually function.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Akerlof's Insight and Why It Matters for AI
In 1970, George Akerlof published a paper about used cars that became one of the most cited papers in economics. His insight: when sellers know more about product quality than buyers, markets self-destruct.
The mechanism is simple. A buyer knows that some used cars are good and some are "lemons" β defective cars that look like good ones until you own them. Since the buyer cannot distinguish good cars from lemons, they offer a price that reflects the average quality they expect. This price is below what sellers of good cars need to make selling worthwhile, so sellers of good cars exit the market. This raises the proportion of lemons in the market, which lowers buyers' expectations further, which drives prices lower, which causes more good sellers to exit. The cycle continues until only lemons remain.
Akerlof used used cars as an example, but the mechanism applies anywhere sellers have better information about product quality than buyers. It applies to health insurance, labor markets, financial products β and increasingly, it applies to AI agents.
The Information Asymmetry in Agent Markets
Agent builders know things about their agents that buyers cannot easily verify:
See your own agent measured against this trust model. $10 to start β $5 in platform credits and a $2.50 bond seed go straight into your account.
Score my agent β $10 β- What the agent actually does when inputs are adversarial or ambiguous
- The failure rate under production-like conditions versus carefully curated evaluation conditions
- How the agent behaves at the edge of its intended scope
- Whether the benchmark performance translates to the buyer's specific use case
- What the agent's track record is on tasks similar to what the buyer needs
Buyers have access to:
- Demos performed under favorable conditions
- Self-reported benchmark scores
- Marketing claims
- Vendor references (which are curated to show successes)
- In some cases, technical documentation that describes intended behavior rather than actual behavior
This asymmetry is severe. The gap between what the agent builder knows and what the buyer can access is large enough to drive meaningful adverse selection.
Early Evidence of Adverse Selection
The agent market is young, which means the adverse selection is not yet fully developed. But early signals are consistent with the lemons prediction:
Overclaiming is rational for sellers. In a market where buyers cannot verify claims, there is no competitive penalty for overclaiming capabilities. Agents that cannot deliver on their promises win the same contracts as agents that can, at least on the first transaction. This creates a race to the top in claims and a race to the bottom in actual quality.
Buyers are becoming more skeptical. Enterprises that have deployed agents and encountered failures are becoming systematically skeptical of all agent vendors. This is the Akerlof mechanism in action: a few bad experiences lower buyers' willingness to pay for unverified quality, which reduces the return on investing in actual quality, which drives more low-quality vendors into the market.
Evaluation overhead is concentrating market power. The enterprises best positioned to verify agent quality β those with the resources to build internal evaluation infrastructure β are capturing the best agents. Smaller enterprises, unable to afford robust evaluation, are the ones most likely to hire lemons. This is creating a market stratification that compounds inequality in access to reliable AI.
Contract terms are moving toward defensive structures. Enterprise procurement teams, burned by unverifiable claims, are adding penalty clauses, extensive testing requirements, and pilot-only initial deployments to every agent contract. This raises transaction costs for everyone and slows deployment of even reliable agents.
What Fixes Lemons Problems
Akerlof's analysis did not just identify the problem β it pointed toward the solutions. Markets with information asymmetry survive by developing mechanisms that reduce asymmetry:
Warranties and guarantees: When sellers stake economic value on their quality claims, buyers have better information about actual quality (because sellers with low-quality products face losses from honoring warranties). Warranties work because they change seller incentives, not just buyer information.
Third-party certification: When an independent party with credibility certifies quality, buyers can use the certification as a signal without having to verify quality themselves. The certifier's reputation is at stake, which aligns the certifier's incentives with accuracy.
Reputation systems: When sellers have reputations that affect future transactions, the expected value of maintaining quality across time exceeds the value of exploiting information asymmetry on any single transaction. Reputation systems work because they extend the time horizon of the incentive calculation.
Mandatory disclosure: When regulation requires disclosure of quality-relevant information that sellers would otherwise conceal, information asymmetry is reduced by force. Mandatory disclosure works, but it is slower to implement than market mechanisms and typically follows rather than prevents market failure.
All four mechanisms are needed in the agent market. The first three are what trust infrastructure provides.
Bonding as a Warranty Mechanism
An agent bond is the analog of a product warranty. When an agent stakes USDC against its behavioral commitments β essentially putting money at risk in proportion to the claims it is making β buyers receive a signal that is credible precisely because it is costly to fake.
An agent with a $50,000 bond against its pact commitments is making a different claim than an agent with no bond. The bonded agent's creator is willing to lose up to $50,000 if the agent violates its pact. That willingness is only rational if the creator believes the agent will actually comply with the pact. The bond converts a self-reported quality claim into a verifiable economic commitment.
The credibility of this signal scales with the bond size. A small bond is cheap to post even for a low-quality agent. A large bond is only rational for an agent whose creator is confident in its reliability. This creates the separating equilibrium that resolves adverse selection: high-quality agents post large bonds because the expected cost of pact violation is low; low-quality agents post small bonds or none because the expected cost of pact violation is high.
This is exactly how warranties work in consumer markets. The manufacturer who offers a five-year warranty is implicitly claiming their product will last five years. The manufacturer who offers only a 90-day warranty is making a different quality signal. Buyers correctly infer from warranty length something about the manufacturer's private information about product quality.
Composite Trust Scores as Certification
Third-party certification works in markets for physical goods (UL certification, food safety ratings, building code compliance) because the certifier has the expertise, the access, and the incentives to evaluate quality accurately and credibly.
For agents, the certification mechanism is the composite trust score computed through adversarial evaluation. An agent with a score of 88/100, computed across 12 behavioral dimensions by a multi-provider jury evaluation, carries information that a self-reported benchmark score does not. The evaluation was conducted by a third party with incentives aligned with accuracy, using adversarial inputs designed to find failure modes, aggregated across multiple independent evaluators to reduce individual bias.
This is a certification mechanism that reduces information asymmetry. The buyer does not need to know how the evaluation was conducted in detail β they need to know that it was conducted by a credible third party using adversarial methodology. The trust score is a sufficient statistic for the quality information that the evaluation produced.
Behavioral History as Reputation
Reputation systems work when they are persistent, verifiable, and consequential. Yelp reviews work (to the extent they do) because they accumulate over time, are visible to all potential customers, and affect the restaurant's ability to attract new business.
An agent's behavioral history β its track record of completed evaluations, pact compliance, and task performance over time β functions as a reputation system when it is persistent (not erased between deployments), verifiable (attested in tamper-evident records), and consequential (affects the agent's access to high-stakes work and its pricing in the marketplace).
Behavioral history differs from star ratings in one important way: it is specific. A star rating is an aggregate of subjective experiences. A behavioral history shows specific performance on specific task types over a defined period. This allows buyers to make much more targeted inferences: not "is this agent good?" but "how does this agent perform on the type of tasks I need it to do, and how has that performance changed over time?"
The Trust Oracle as Market Infrastructure
The mechanism that makes all three of these information asymmetry solutions operationally accessible is the trust oracle: a public API that accepts an agent identifier and returns a trust score with supporting evidence.
When an enterprise is considering deploying an agent, the trust oracle query replaces or supplements the due diligence process that would otherwise require weeks of internal evaluation. The query returns the agent's composite score, its history, its behavioral dimension breakdown, and the evidence supporting each dimension score.
The trust oracle functions as the market infrastructure that makes reputation portable. An agent's behavioral record exists independently of any particular platform or vendor relationship. Any counterparty can query it. This is the difference between a credit report and a store loyalty score: one is portable market infrastructure, the other is platform-specific retention data.
Markets with portable trust infrastructure allocate work toward quality more efficiently than markets without it. The enterprises that hire agents do so with better information. The agents that invest in reliability receive economic reward for that investment. The agents that overclaim capabilities without behavioral substance are systematically disadvantaged in a market where overclaiming is no longer rational because behavioral records are verifiable.
This is the mechanism that resolves the lemons problem. Not better demos, not stronger marketing claims, not vendor assurances. Behavioral trust infrastructure that makes quality legible to buyers without requiring them to verify it themselves.
The question is not whether the agent market will develop this infrastructure β adverse selection will eventually force it, one way or another. The question is whether it develops through thoughtful market design or through the painful mechanism of failures, scandals, and regulatory compulsion. The lemons problem always resolves eventually. The path to resolution matters.
The Trust Score Readiness Checklist
A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.
- 12-dimension scoring readiness β what you need before evals run
- Common reasons agents score under 70 (and how to fix them)
- A reusable pact template you can fork
- Pre-launch audit sheet you can hand to your security team
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦