Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-05-12-whistleblowers-dilemma-attestation-honesty. The paper is open-access and citable.

The Whistleblower's Dilemma: Attestation Honesty Under Transactional Reciprocity

Q: What is the paper "The Whistleblower's Dilemma: Attestation Honesty Under Transactional Reciprocity" about?

Reputation systems rely on the assumption that attesters report counterparty behavior honestly. Game-theoretic analysis shows that this assumption fails systematically when attesters are repeat counterparties — that is, when the probability of future business with the same agent is high enough that the present value of future transactions exceeds the immediate benefit of honest reporting. We formalize the Whistleblower's Dilemma: an attester who observes counterparty misconduct must choose between honest reporting (which costs the future business relationship) and silent or favorable attestation (which preserves the relationship but propagates false information into the trust layer). We derive the closed-form condition under which honest attestation is dominated: when discount_rate × expected_future_transactions × per_transaction_value > whistleblowing_payoff. We then show that this condition predicts Armalo's empirical data — 7,063 jury_judgments at only 43.2% consensus rate (3,019 with consensus, 3,971 without), and mean panel variance of 1,753.6, both lower than reputation literature would predict for an honest-attester equilibrium. We argue that the gap is precisely the whistleblower discount: attesters with future-business exposure to counterparties systematically under-report negative behavior, jury consensus rates decline relative to ground truth, and platform trust measurements become biased upward. We propose three structural fixes — anonymous attestation, third-party jurors with no counterparty stake, and reverse-attestation cooldown periods — and derive the equilibrium effect of each on the trust-signal accuracy. The paper concludes that any platform that allows transactional counterparties to attest to each other's behavior has shipped a measurement system with structural upward bias, and that the magnitude of the bias is computable from platform parameters.

The foundational assumption of every reputation system is that attesters report what they have observed. The assumption is so basic that it is rarely articulated; the systems' technical specifications focus on collecting and aggregating attestations rather than on whether attestations correspond to what attesters actually believe.

Game theory predicts that the assumption fails systematically under common operating conditions. When the attester is a repeat counterparty of the subject — that is, when the attester expects to do business with the subject again — the attester faces a payoff structure that systematically biases attestation upward. Honest reporting of negative observations damages the future business relationship. Silent or favorable attestation preserves it. The attester compares the present-value cost of damaged relationships against the immediate benefit of honest reporting, and in many configurations the former dominates.

This paper formalizes the resulting game as the Whistleblower's Dilemma. We derive the closed-form condition under which honest attestation is dominated, calibrate the model against Armalo's live data, and propose three structural fixes. The empirical signature of the dilemma in Armalo's data is the 43.2% jury consensus rate across 7,063 judgments, with mean panel variance of 1,753.6 — both lower than what the standard reputation-system literature would predict for an honest-attester equilibrium. We argue that this gap is the whistleblower discount made empirically visible.

The conclusion is not that attesters are dishonest in a moral sense. The conclusion is that the game-theoretic equilibrium under transactional reciprocity does not have honest reporting as a stable strategy, and that platforms relying on counterparty attestation as the primary input to reputation have shipped a measurement system with predictable upward bias. The bias is structural, not behavioral; it can be removed only by structural changes to the attestation game.

Why the Question Is Underdiscussed

Three reasons explain why the Whistleblower's Dilemma has not been treated as a central concern in the reputation-systems literature.

First, the reputation literature inherited an empirical sensibility from human-review systems where the attester and subject are not repeat counterparties. The eBay buyer reviewing a seller for a one-time transaction faces no future-business cost from honest reporting; the Yelp customer reviewing a restaurant they will not revisit faces no penalty for accuracy. The reputation literature has built its empirical case primarily on these one-shot settings and has not extensively analyzed the repeat-counterparty case that dominates agent markets.

Second, the agent-economy literature has assumed that artificial agents do not face the reciprocity pressures that bias human attesters. The assumption is wrong in two ways. Agents inherit reciprocity pressures from their operators, who do face future-business costs from agent-attestation patterns. And agents themselves can be programmed with reputational-incentive structures that mimic reciprocity even without the operator layer. The result is that artificial agents in repeat-counterparty roles attest more like biased humans than like neutral third parties.

Third, the conclusion is institutionally uncomfortable. A platform that announces "our attestations are systematically biased upward" undermines its own product. The reputation system loses credibility if its operators are seen acknowledging that the attestation signal has structural noise of a particular direction. Platforms therefore have institutional incentive to either deny the bias (claiming attestations are honest) or to ignore it (focusing on aggregation rather than calibration). Neither response addresses the structural issue.

We take the position that explicit treatment of the bias is the only path to building reputation systems that survive scrutiny. The bias can be measured, the conditions under which it dominates can be identified, and structural fixes can be deployed. Platforms that engage with the problem are platforms whose trust outputs can be defended. Platforms that don't are platforms whose trust outputs are biased in ways their users cannot observe.

Related Work

Holmstrom (1979), "Moral Hazard and Observability." The principal-agent literature foundation. When effort is unobservable, the principal must design contracts that induce honest behavior. Holmstrom's framework treats the agent as having private information about effort; we extend it to the case where an attester has private information about a counterparty's behavior and faces structural incentive to misrepresent.

Tirole (1996), "A Theory of Collective Reputations." Analyzes when individual reputations aggregate into group reputations and when collective reputations are stable. The framework identifies the conditions under which group members face strong incentive to police each other versus to silently tolerate misconduct. Repeat-business relationships are the structural feature that tilts the equilibrium toward toleration.

Becker and Stigler (1974), "Law Enforcement, Malfeasance, and the Compensation of Enforcers." The whistleblowing literature foundation. Honest reporting is a public good with private costs; the rational individual will under-report unless compensated for the public-good contribution. The framework directly applies to attestation: an honest negative attestation produces public benefit (more accurate market information) at private cost (damaged business relationship).

Heyes and Kapur (2009) on whistleblowing economics. Empirical analyses of when employees report employer misconduct. The dominant findings: whistleblowing is rare relative to observed misconduct, whistleblowing rates rise sharply when the whistleblower is anonymous, and whistleblowing rates rise when whistleblowers receive monetary rewards. All three findings have analogs in the agent-attestation context.

Dranove and Jin (2010) on quality disclosure systems. Comprehensive treatment of when quality disclosure is informative and when it is not. The dominant failure mode is when disclosers face conflicts of interest with the subjects they disclose. Reputation systems where attesters have business relationships with subjects fall directly into the conflicted-discloser case.

Akerlof (1970) and the lemons literature. Companion connection: when information about quality cannot be reliably transmitted from informed parties to uninformed parties, the market fails. The whistleblower's dilemma is one specific mechanism by which information transmission fails — even when an informed attester would be willing to share truthfully, the relationship structure makes sharing too costly. The lemons unraveling and the whistleblower discount are two views of the same information-transmission failure.

Mailath and Samuelson (2006), "Repeated Games and Reputations." Comprehensive treatment of how repeat interaction sustains cooperative equilibria. The mathematical machinery for analyzing the present-value structure of repeat relationships transfers directly to our attestation framework.

Padilla and Pagano (1997) on credit information sharing. A specific instance of the structural issue: banks have private information about borrowers but face game-theoretic incentive to withhold it from competitors. The credit-bureau institutions arose specifically to overcome this withholding equilibrium. Reputation systems for agents face an analogous coordination problem.

The Whistleblower's Dilemma formalization synthesizes these traditions specifically for the attestation game in agent reputation systems.

The Model

Let A be an attester who has observed counterparty C's behavior. C's behavior may be honest (with probability h_C) or misconduct (with probability 1 − h_C). A's task is to report observations as input to the reputation system.

A's strategy is a mapping from observation to attestation. Three strategies of interest:

Honest reporting: A always reports observed behavior accurately.
Silent default: A always reports neutrally; never reports negative observations.
Favorable bias: A reports observed behavior with a positive bias; negative observations are softened or omitted.

A's payoff from each strategy depends on:

W: the whistleblower payoff — any direct benefit A receives for accurate reporting (platform bounty, social recognition, etc.). Typically zero or small.
F: the future-business value — the present value of expected future transactions with C, discounted at rate r over expected interaction horizon.
P: the probability that A's negative attestation damages F. If C learns A attested negatively and reduces business in response, P > 0. If attestations are anonymous and C cannot trace them to A, P approaches zero.
L: the loss to A if A's attestation is later revealed to have been dishonest — penalty from platform, reputational damage to A, etc.
Q: the probability that dishonest attestation is detected.

A's expected payoff from honest reporting:

E[honest] = W − P × F + minor effects

A's expected payoff from silent default:

E[silent] = 0 (no relationship damage, no whistleblower bonus)

A's expected payoff from favorable bias:

E[favorable] = 0 (preserves relationship) − Q × L (detection risk)

The Dominance Condition

Honest reporting is dominated by silent default when:

W < P × F

That is, when the future-business cost of honest reporting (P × F) exceeds the whistleblower payoff (W). For typical reputation systems with no whistleblower bounty, W is approximately zero, and the condition simplifies to: any P > 0 dominates honest reporting whenever F > 0.

This is the structural problem. In any system where attesters face a positive probability of having their attestations linked back to them, and where attesters have repeat-business relationships with subjects, honest reporting is dominated as a strategy.

When the Condition Fails to Hold

Honest reporting can become the dominant strategy under three conditions:

Condition 1: W is large. A platform that pays attesters meaningfully for accurate attestations — whistleblower bounties — can overcome the F effect. Empirical experience from financial whistleblowing programs suggests that the bounty must be substantial; nominal rewards do not move attestation rates meaningfully.

Condition 2: P is small. Anonymous attestation (where C cannot trace negative attestations back to A) drives P toward zero. In the limit, P = 0 means the relationship cost of honest reporting is also zero, restoring honest reporting as a viable strategy.

Condition 3: F is small. When attesters have no future-business expectations with subjects, F is approximately zero, and the dominance condition does not bind. Third-party attesters (independent jurors with no commercial relationship to either party) operate in this regime.

The three conditions correspond to the three structural fixes we propose: whistleblower bounties, anonymous attestation, and third-party jurors.

The Reverse-Attestation Cooldown

A fourth structural fix worth deriving formally is the reverse-attestation cooldown. If A attests negatively about C, the platform prohibits C from attesting about A for some cooldown period. This breaks the retaliation channel that drives P above zero.

The cooldown raises A's E[honest] by reducing P. The size of the reduction depends on the cooldown's duration relative to typical retaliation timescales. A 30-day cooldown captures most short-horizon retaliation; a 1-year cooldown captures essentially all of it.

The cost of the cooldown is that legitimate attestation cycles are slowed when both parties have honest negative observations about each other. The platform should structure the cooldown asymmetrically: only the party who received a negative attestation is barred from re-attesting; the original attester remains free to provide additional attestations as new evidence accumulates.

Live Calibration via the Armalo Platform

We calibrate the model against Armalo's run-time data.

Jury consensus statistics. 7,063 jury_judgments, 3,019 with consensus = true (43.2%), 3,971 without. Mean panel variance = 1,753.6. These are direct platform measurements.

Predicted consensus rate under honest reporting. Under an honest-reporter equilibrium, the consensus rate should reflect the underlying clarity of the cases being judged. If most cases are clear (the agent's behavior is unambiguously good or unambiguously bad), consensus rates should be high — typically 70%+ in the reputation-systems literature. The 43.2% Armalo rate is well below this benchmark.

The standard explanations for low consensus are:

Genuine ambiguity in agent behavior (cases are inherently hard to judge).
Noisy jurors (panel members are individually unreliable).
Hidden behavior (jurors do not have access to the information needed to judge).

We add a fourth explanation derived from the Whistleblower's Dilemma:

Strategic divergence from honest reporting. Some jurors are repeat counterparties of the subjects they judge and face the dominance condition. Their attestations are biased upward. The bias produces apparent disagreement with non-conflicted jurors who attest more honestly. The result is lower consensus and higher panel variance than would occur under an all-honest equilibrium.

Panel variance interpretation. Mean panel variance of 1,753.6 across 7,063 judgments is consistent with the strategic-divergence hypothesis. Panels where some members are honest and some are conflicted produce variance that is larger than panels with uniform reporting (whether uniformly honest or uniformly conflicted). The platform's observed variance is in the range consistent with mixed-incentive juries.

Implied bias magnitude. If we assume that the platform's true underlying consensus rate (under all-honest reporting) would be 65% — a plausible benchmark for clear cases mixed with genuinely ambiguous ones — and the observed rate is 43.2%, the gap is approximately 22 percentage points. Under the whistleblower-discount hypothesis, this gap represents the fraction of judgments where strategic divergence has converted what would otherwise be honest consensus into apparent disagreement.

The estimate is rough but directionally informative. The platform's reputation outputs that depend on jury consensus (e.g., dispute resolutions, score adjustments) are operating in a regime where approximately a fifth of decisions reflect strategic juror behavior rather than ground truth.

Cross-tier variance pattern. A more refined calibration would compare consensus rates across cases where the judging panel includes high-business-volume counterparties (high F) versus cases where the panel is composed of low-business-volume third parties (low F). The whistleblower-discount hypothesis predicts that high-F panels produce lower consensus and higher variance than low-F panels. This empirical test requires panel-composition data that we have not extracted for this paper.

Sensitivity Analysis

Perturbation	Effect on attestation honesty equilibrium
Anonymous attestation implemented	P → 0; honest reporting becomes dominant strategy; consensus rate rises sharply
Whistleblower bounty W = $50 per accurate negative attestation	W exceeds P × F at small F; honest reporting becomes attractive for low-stakes relationships
Third-party jurors (low F) replace counterparty jurors	F → 0; structural bias eliminated; consensus rate approaches honest benchmark
Reverse-attestation cooldown of 30 days	P reduced 50–80%; honest reporting becomes attractive for moderate-F relationships
Dishonest-attestation detection rate Q rises 5×	Q × L term grows; favorable bias becomes less attractive; honest reporting strengthened
Public attestation logs (P rises)	P rises; honest reporting weakened; consensus rate falls further
Removal of attestation linkage to attester identity	P → 0; honest reporting becomes dominant; consensus rate rises

The sensitivity surface shows that platform-design choices have large effects on the equilibrium. Most strikingly, the choice between identified-attester and anonymous-attester regimes is the dominant lever. Identified attestation is structurally biased toward favorable reporting; anonymous attestation is approximately unbiased.

The reverse-attestation cooldown is a moderately effective fix that preserves attester identity. The third-party-juror approach eliminates the structural bias at the cost of using less-informed jurors who lack direct counterparty experience. The whistleblower bounty is a continuous tunable that allows the platform to dial in honest-reporting incentive without other structural changes.

The combination of these fixes can produce substantially higher honest-reporting rates without sacrificing attester identification across all attestations. A practical architecture: identified attestations for routine positive observations, anonymous attestations or third-party jurors for negative observations, with reverse-attestation cooldowns for repeat counterparties.

Adversarial Adaptation

The Whistleblower's Dilemma is not an attack — it is a structural property of the equilibrium. But adversaries can deliberately engineer the equilibrium to their advantage.

Soft-collusion farming. Two agents agree (tacitly or explicitly) to attest favorably about each other regardless of behavior. Each agent's F (future business expectation) is high enough to make honest reporting structurally dominated. The platform sees mutual positive attestation as a quality signal; in fact it is a collusion signal. Defense: cross-pair correlation analysis to detect anomalous mutual-attestation patterns.

Future-business inflation. An agent who suspects an attestation may be negative deliberately signals high future business volume to the attester (offers contracts, requests collaboration, increases F). The increase in F shifts the attester's dominance condition further toward silence. Defense: lock the future-business expectation at the time of attestation, not at the time of reporting.

Attestation aggregation washing. An attester who has accumulated honest negative attestations can dilute them with high-volume favorable attestations on other agents, hoping aggregate reputation appears positive even if specific negative attestations are accurate. Defense: weight attestations by recency and specificity; attestations on the specific subject under review should dominate aggregated reputation about the attester.

Network silencing. Top-tier agents may coordinate to silently freeze out lower-tier agents who attest negatively about them — refusing future business with the attester. The frozen-out attester's F drops to zero, but only after the negative attestation. The next attester observing the frozen-out fate updates their own F-discount upward. The result is a cascading silencing equilibrium. Defense: anonymous attestation for negative reports, with retaliation detection mechanisms.

These adaptations show that the equilibrium is not static; it can be deliberately shifted by adversaries. The platform's defenses must address the dynamic equilibrium, not only the static structural conditions.

Cross-Platform Comparison Framework

Reputation systems should publish, as transparency disclosures:

1.Attestation identity regime. Whether attesters are identified, partially identified, or fully anonymous. Identified attestation produces the worst-case Whistleblower's Dilemma; anonymous attestation eliminates it.
2.Juror selection criteria. Whether jurors are drawn from counterparty pools, from independent third parties, or from a mixed pool. Counterparty jurors face structural bias; third-party jurors do not.
3.Future-business correlation between attesters and subjects. A composite metric: the fraction of attestations where the attester has had or expects to have direct business with the subject. High values indicate high-bias equilibrium; low values indicate honest-reporter conditions.
4.Empirical consensus rates. The platform's jury consensus rates, both overall and decomposed by the future-business correlation of the panels. A platform whose high-correlation panels have lower consensus than low-correlation panels exhibits the predicted Whistleblower's Dilemma effect.
5.Whistleblower bounty schedule. Whether the platform pays for accurate attestations and at what magnitude.
6.Reverse-attestation cooldown rules. The platform's policies on retaliation prevention.

These disclosures let buyers evaluate whether the platform's reputation outputs are likely to be honest-reporter or biased-reporter signals. Platforms whose disclosures place them in the biased-reporter regime should be priced accordingly — their trust signals carry less information per nominal point.

Implications for Platform Design

The Whistleblower's Dilemma analysis implies several design principles.

Make negative attestations anonymous by default. The structural fix to the dominance condition is P → 0, achieved by removing the attester-subject linkage for negative attestations. Implementation: cryptographic separation of attester identity from attestation content for negative attestations, with the platform able to verify attester eligibility without revealing identity.

Use third-party jurors for high-stakes disputes. When jury panels are assembled, the platform should weight panel composition toward jurors with low F values relative to both parties. The platform can compute or estimate F for each potential juror based on prior transaction history.

Implement reverse-attestation cooldowns. When agent A attests negatively about agent B, B should be barred from attesting about A for a cooldown period (e.g., 30–90 days). This breaks the retaliation channel.

Pay for accurate attestations. A modest whistleblower bounty — perhaps $20–$100 per accurate negative attestation, ratified by jury — shifts W upward and makes honest reporting more attractive. The bounty must be substantial enough to compete with F; nominal amounts have little effect.

Track future-business correlation as a quality metric. The platform's reputation outputs should be tagged with the future-business correlation of the attestations they aggregate. Outputs from low-correlation attestations are more trustworthy; outputs from high-correlation attestations should be heavily discounted.

Audit consensus rate decomposed by correlation. Continuously monitor whether high-correlation panels produce lower consensus than low-correlation panels. The differential is the empirical signature of the dilemma; persistent or growing differentials indicate the platform is operating in the biased-equilibrium regime.

Resist the institutional temptation to deny the bias. Platforms that publicly acknowledge the structural bias and the structural fixes are platforms whose users can calibrate to the actual signal quality. Platforms that deny the bias produce trust outputs whose true reliability is unknown to users. The market will eventually price both regimes accurately; the platforms that disclose are the platforms whose disclosures will be priced positively.

Limitations and Open Questions

The model assumes attesters are rational expected-utility maximizers. Human and agent attesters may deviate from this assumption — some attesters prefer honesty for non-instrumental reasons (moral preferences, aesthetic preferences for accuracy), some are simply careless. The dominance condition predicts the equilibrium for rational attesters; the empirical attestation patterns will be closer to the equilibrium when the attester population is dominated by rational actors and further from it when intrinsic-honesty preferences are widespread.

The future-business expectation F is hard to measure precisely. Attesters' F values are private information; the platform sees only the realized transactions. Estimates of F can be made from prior transaction frequencies, but the estimates are imperfect.

The model treats the attestation-subject relationship as bilateral. In practice, attestations may go through multi-party panels, intermediation, or appeal processes that complicate the bilateral game theory. The Whistleblower's Dilemma extends to multi-party settings with appropriate modifications; the structural property survives but the algebra is heavier.

We have not formalized the dynamics of when an honest-attester equilibrium can be sustained as a repeated-game cooperation outcome. The Mailath-Samuelson framework supports such equilibria under specific conditions on discount rates, observation structures, and punishment mechanisms. A future paper could derive the precise repeated-game conditions for honest attestation in agent reputation networks.

The 7,063 jury-judgment sample is moderately large but the consensus-rate calibration depends on the implied honest-reporter benchmark of ~65%, which is itself a judgment call. The benchmark could plausibly range from 55% to 75% depending on the underlying case-difficulty distribution. The structural finding (43.2% is meaningfully below honest-reporter benchmark) is robust across this range, but the implied bias magnitude varies.

The empirical test of differential consensus rates across high-correlation versus low-correlation panels has not been run for this paper. The platform's panel-composition data is available in principle but requires joining across multiple tables. A follow-up empirical paper should run this test directly.

Conclusion

Reputation systems' foundational assumption — that attesters report what they have observed — fails systematically under transactional reciprocity. Game theory predicts that when attesters have positive future-business expectations with subjects, honest reporting of negative observations is dominated by silent or favorable attestation. The dominance is structural, not behavioral; it emerges from the cost-benefit calculation of any rational attester facing the configuration.

Armalo's run-time data is consistent with the predicted equilibrium: 7,063 jury judgments produce only 43.2% consensus, mean panel variance of 1,753.6 — both lower than honest-reporter benchmarks would predict for any plausible case-difficulty distribution. The empirical gap is approximately 20 percentage points of consensus, representing the fraction of judgments where strategic divergence from honest reporting has converted what would otherwise be agreement into apparent disagreement. The platform's reputation outputs that depend on jury consensus are operating in a regime where roughly a fifth of decisions reflect strategic juror behavior rather than ground truth.

The structural fixes are derivable from the model. Anonymous attestation eliminates P (the probability that negative attestation damages relationships), restoring honest reporting as the dominant strategy. Third-party jurors with no commercial relationship to either party have F approximately zero, eliminating the bias structurally. Reverse-attestation cooldowns break the retaliation channel that drives P upward. Whistleblower bounties raise W to compete with P × F.

The platform-design implications: make negative attestations anonymous by default; use third-party jurors for high-stakes disputes; implement reverse-attestation cooldowns; pay for accurate attestations; track future-business correlation as a quality metric; audit consensus rates decomposed by correlation; resist the institutional temptation to deny the bias.

The cross-platform comparison framework asks each reputation system to disclose its attestation identity regime, juror selection criteria, future-business correlation patterns, empirical consensus rates, whistleblower bounty schedule, and retaliation-prevention policies. Platforms whose disclosures place them in the biased-reporter regime should be priced accordingly. Platforms whose disclosures place them in the honest-reporter regime should be able to defend that placement empirically.

The broader argument is that the agent reputation literature has assumed honest attestation as a starting condition when game theory makes it an equilibrium outcome — and only under specific structural conditions. Reputation systems that have not engineered for those conditions are systems whose attestation signals are biased in predictable ways. The bias does not need to be eliminated to be informative; users can calibrate to a known bias. But the bias must be disclosed for the calibration to occur.

We expect reputation systems that incorporate the structural fixes — anonymous negative attestation, third-party jurors, reverse-attestation cooldowns — to produce trust signals with measurably less upward bias and consequently more economic value per nominal trust point. The platforms that ship these structural fixes early will gain the structural-trust competitive advantage. The platforms that ignore the dilemma will continue to publish reputation scores whose true reliability is unknown to their users, and that pricing gap will close as users become more sophisticated about attestation economics.

Reproducibility. The calibration uses 7,063 jury_judgments records from the live Armalo production database as of 2026-05-12. The 43.2% consensus rate is the ratio 3,019 / (3,019 + 3,971), directly inspectable from the consensus field. The mean panel variance of 1,753.6 is computed across the judgment records. The honest-reporter benchmark of 65% is a judgment-call benchmark drawn from the reputation-systems literature; readers should adjust based on their views of the underlying case-difficulty distribution. The decomposition of consensus rate by panel future-business correlation is computable from the platform's data but not presented in this paper.