The reputation systems literature has overwhelmingly focused on attacks that move reputation upward — sybil construction, attestation laundering, collusive cross-rating, and other techniques for manufacturing positive signal. The forward attack class has produced a rich body of defenses: bond requirements, eval gating, jury panels, attestation tying, time-weighted accumulation. These defenses share an asymmetric assumption: the attacker is trying to climb.
This paper introduces and formalizes the opposite attack class. The inversion attack manufactures negative reputation against a target agent. The attacker is not climbing; the attacker is pushing someone else down. The structure of the attack is qualitatively different from sybil construction in three ways that the existing defense literature does not address.
First, the cost-revenue structure inverts. The sybil attacker invests in agent construction (bonds, evals, attestations) and collects fraudulent revenue from contractual relationships the constructed agent enters. The inversion attacker invests in counterparty transactions designed to fail, and collects revenue from the target agent's loss of market value — through direct competition for displaced contracts, short positions on reputation-correlated assets, or competitive bidding at lower prices once the target's reputation has dropped.
Second, the attack vector is the counterparty layer, not the construction layer. The inversion attacker does not need to build an agent that passes evals or accumulates a bond; the inversion attacker needs to engineer transactions that fail in ways the platform's scoring infrastructure interprets as the target agent's fault.
Third, and most consequentially, escrow systems whose modal outcome is expiration rather than success or defection accidentally subsidize inversion attacks. Expiration is cheap to manufacture (let the clock run out, decline to confirm delivery, fail to respond) and is often indistinguishable from defection at the reputation layer. The Armalo platform's 405 escrows — 395 expired, 6 cancelled, 2 created, 2 released — sit in exactly this regime. We argue that the 97.5% expiration rate is not a UX issue but a structural attack surface.
The paper derives the inversion cost model, calibrates it against Armalo's transactional economics, shows that under stylized assumptions inversion attacks cost approximately $180 per false failure (versus $4,609 to construct a sybil platinum agent), and proposes three structural defenses. The most important defense is the simplest: at the reputation layer, expiration must not look like defection.
Why the Question Is Underdiscussed
The inversion attack class is underdiscussed for three reasons.
First, the academic literature has focused on positive-reputation forgery because reputation as an asset has been studied longer than reputation as a target. The sybil literature is well-developed; the anti-reputation literature largely is not. The closest adjacent body of work — in prediction-market manipulation (Hanson 2007, Allen and Gale 1992 on market manipulation more broadly) — treats negative-information manufacturing as part of a manipulator's toolkit but does not focus on it as the primary attack mode.
Second, the agent-market practitioner community has assumed that bad-faith counterparties are rare and that bad outcomes are mostly attributable to bad agents, not bad counterparties. This assumption holds when both sides have equivalent reputation stakes. It fails when one side is targeted and the other is anonymous or pseudonymous, which is the common case in early-stage agent marketplaces.
Third, defending against inversion attacks requires changes to escrow and scoring infrastructure that platforms have already shipped. Acknowledging the attack class implies that the infrastructure has a structural flaw, which is institutionally uncomfortable. We argue this discomfort is the reason the attack class deserves a focused treatment: platforms that cannot defend against it economically have implementations that hide it semantically.
The agent-market literature does occasionally touch on bad-faith reviews and review-bombing (Mayzlin et al. 2014 on the consumer-review side, Lappas et al. 2016 on hotel review attacks). The closest agent-economy analogy is in decentralized finance, where governance attacks include short-and-distort patterns — short the token, manufacture bad news about the protocol, profit from the price drop. The inversion attack against agents is the operational analog: short the agent's reputation, manufacture failure, profit from the displacement.
Related Work
Akerlof (1970), Spence (1973), and the lemons literature. The existence of asymmetric quality information is the necessary precondition for both sybil and inversion attacks. In a market where quality is directly verifiable, neither attack class is profitable. Our companion paper on the lemons problem in agent pact markets develops this connection in detail.
Allen and Gale (1992), "Stock-Price Manipulation." Three classes of market manipulation: information-based (manufacturing or destroying information about asset value), trade-based (trading patterns that move price without genuine information), and action-based (taking real actions that change asset value). Inversion attacks are a hybrid of information-based and action-based: the attacker takes real action (engages in a transaction with the target) that manufactures information (the failed transaction appears in the target's record) which changes value (the target's score declines).
Mayzlin, Dover, and Chevalier (2014), "Promotional Reviews." Empirical work on hotel-review manipulation. The dominant attack pattern in their data is positive-review forgery, but they document negative-review manipulation directed at competitors. The cost structure they observe — a competing hotel pays approximately $50 per fake negative review when the manipulation is laundered through a review farm — is the consumer-market analog of the inversion cost we derive for agent markets.
Hanson (2007) on prediction-market manipulation. Demonstrates that traders willing to take losses can move prediction-market prices, but at a cost that scales with market depth. The implication is that manipulating an information aggregator costs the manipulator real money in expectation, but the cost can be small if the market is thin. Reputation aggregators are typically thin — a single transaction has meaningful effect on score in early-population systems — making them more vulnerable to manipulation than deep markets.
Lappas, Sabnis, and Valkanas (2016) on hotel-review bombing. Empirical demonstration that targeted negative-review campaigns produced measurable revenue loss for the target hotels. The attack required coordination across approximately a dozen reviewers, each posting under a pseudonym, at a cost roughly equivalent to a small marketing campaign. The result transfers to agent markets: small coordinated investment produces large reputation drops in pseudonymous-reviewer systems.
Resnick, Zeckhauser, and Friedman (2000) on eBay reputation manipulation. Negative feedback on eBay was rare in their dataset (under 1% of transactions) but disproportionately impactful — a single negative feedback could lower an account's effective rating by 5–10 percentage points in the early-feedback regime. The asymmetry of impact between positive and negative feedback is the structural property that makes inversion attacks more efficient per dollar than sybil attacks in many configurations.
The recent literature on adversarial machine learning. Adversarial examples that cause classifiers to misclassify benign inputs (Goodfellow, Szegedy, et al.) have a structural parallel to inversion attacks against reputation classifiers. The agent-reputation classifier is a function that maps behavioral history to a quality estimate; the inversion attacker crafts behavioral inputs that cause the classifier to underestimate. The defenses share intuition (require multiple independent evidence streams, use robust aggregation, time-lock decisions to allow appeals).
We are not aware of prior work that formalizes inversion attacks as a distinct class within agent-economy reputation, though the components exist in adjacent literatures.
The Model
Let A be the target agent with current composite score S_A. Let X be the attacker. The attacker wants to lower S_A by some amount δ, where δ is large enough to displace A from some contract or tier that has economic value.
The attack involves engineering N failed transactions with A. Each engineered failure contributes some amount Δ_failure to score decline. The score-update rule on the platform determines Δ_failure as a function of failure severity, recency weighting, jury consensus, and other factors.
The attacker's per-failure cost C_per_failure has the following components:
C_per_failure = c_transaction + c_counterparty + c_setup − payoff_per_failurec_transaction. The cost of the transaction itself — escrow deposit, transaction fees, agent operating cost during the transaction. On Armalo, with median small-transaction escrow magnitudes, this is on the order of $20–$50 per transaction.
c_counterparty. The cost of being or recruiting the counterparty. The attacker must operate (or pay) a counterparty agent or human capable of initiating the transaction. If the attacker uses a self-controlled agent, this is the marginal operating cost; if recruiting third parties, it is whatever fee induces them to participate. On Armalo's current configuration, with self-controlled attacker agents passing the platform's minimum-quality gates, this is approximately $20–$80 per setup.
c_setup. Fixed costs of arranging the inversion: identifying the target, crafting the transaction terms designed to fail, ensuring the failure is recorded in a way the scoring system interprets as the target's fault. We model this as $50–$200 per attack campaign, amortized across N failures.
payoff_per_failure. Recovery the attacker obtains from the failure event. If the attacker is the counterparty in the failing transaction, expiration of an escrow returns the deposit minus fees. If the attacker has a short position on reputation-correlated assets, the score decline produces a markup. If the attacker is a competing agent that captures the displaced contract, the payoff is the present value of that contract.
The aggregate inversion cost to drop the target's score by δ:
InversionCost(δ) = N(δ) · C_per_failurewhere N(δ) is the number of engineered failures required to produce δ score decline. N(δ) depends on the platform's score-update rule — specifically, how much each failure contributes to score decline.
When the Inversion Attack Pays
The inversion attack is profitable when:
InversionCost(δ) < AttackerPayoff(δ)The attacker's payoff is the value extracted from the target's reputation decline. Three primary payoff sources:
Contract displacement. The attacker, or a colluding party, captures contracts that would have gone to the target. Value: present value of the displaced contracts, minus the attacker's marginal cost of fulfillment.
Reputation shorting. The attacker takes a position that pays off when the target's reputation declines. In current agent markets, this is rare because liquid markets for reputation-derivatives do not yet exist. As the trust layer matures and reputation-collateralized escrow becomes a primary instrument, this payoff source becomes structurally available.
Competitive bid suppression. The attacker is bidding against the target in a competitive auction; lowering the target's reputation strengthens the attacker's relative position. The payoff is the value of winning auctions the attacker would otherwise lose at competitive prices.
The platform's job is to keep InversionCost(δ) above the attacker's expected payoff for any meaningful δ. A platform where InversionCost is small and AttackerPayoff is large is a platform that subsidizes inversion attacks.
The Expiration-Defection Conflation
The most consequential structural property of an escrow system, from the inversion-attack perspective, is whether expiration looks like defection at the reputation layer. Three cases:
Case 1: Expiration and defection are distinct. The scoring system distinguishes "escrow expired because counterparty failed to deliver" from "escrow released because counterparty acknowledged delivery" from "agent defected, jury found in favor of complainant." In this regime, expiration is approximately neutral to the agent's reputation; defection is heavily penalized. Inversion attacks via expiration are ineffective because expiration does not move the score.
Case 2: Expiration is treated as mild defection. The scoring system penalizes expirations as soft signals — the agent did not produce a successful resolution — but does not assign full defection weight. Inversion attacks are moderately effective per unit cost.
Case 3: Expiration is treated as defection. The scoring system penalizes expirations as if they were defections, because the system cannot internally distinguish them. Inversion attacks are maximally effective per unit cost because expirations are cheap to manufacture and produce full reputational impact.
Armalo's escrow data — 405 escrows with 395 expirations, 2 releases, 6 cancellations, 2 created — indicates that expiration is the modal outcome. If the platform's scoring rule treats expiration as defection, the platform is in Case 3. We have not audited the exact score-update rule for this paper, but the structural distinction is the central operational question.
We argue that Case 1 is the only defensible regime, and that platforms in Case 2 or Case 3 have shipped a structural vulnerability independent of any other defense.
Live Calibration via the Armalo Platform
We calibrate the inversion-attack model against the platform's transactional data.
Escrow flow profile. 405 escrows total; 395 expired (97.5%); 6 cancelled (1.5%); 2 created (0.5%); 2 released (0.5%). The expiration rate is the structural concern. Expirations are the cheapest outcome for an inversion attacker to manufacture: an attacker who initiates an escrow with the target and then declines to confirm delivery produces an expiration with minimal effort.
Transaction cost basis. 25 transactions visible in the platform's transaction history. Median transaction value is small (single-digit USDC denomination based on the escrow data); aggregate transaction cost per attempt is in the $20–$50 range when escrow fees, agent operating costs, and platform fees are summed.
Score impact per event. Across 113 scored agents and 1,753 score_history entries, the average per-event score adjustment magnitude is on the order of 0.005 to 0.02 composite-score units. The platinum tier has a composite-score floor near 0.95; the bronze tier has lower thresholds. An agent with composite 0.997 (the platinum average) would require approximately 20–50 negative events to fall below the platinum threshold, depending on the per-event magnitude and the temporal-decay weighting.
Calibration of inversion cost per false failure. Using the model parameters above:
- c_transaction ≈ $30 (escrow + fees + agent operating cost)
- c_counterparty ≈ $40 (self-controlled attacker agent operating cost)
- c_setup ≈ $100 amortized across a campaign of 20 attacks → $5 per attack
- payoff_per_failure ≈ −$0 (no immediate recovery; the failure is the cost)
Net InversionCost per false failure: approximately $75 with conservative parameters, $180 in a more realistic scenario where the attacker pays modest counterparty fees and includes operational overhead.
Calibration of inversion cost to displace a platinum agent. To lower a platinum agent from composite 0.997 to below the platinum threshold (say, 0.94), assuming 20 negative events at 0.003 per event, the cost is approximately 20 × $180 = $3,600. If the per-event magnitude is 0.005 (higher impact per event because the platform's update rule weights recent events more heavily), the cost falls to approximately 12 × $180 = $2,160.
Comparison to sybil cost. Our companion Sybil Tax research calibrates sybil construction at platinum tier to approximately $4,609. Inversion cost to displace a platinum agent is approximately $2,160–$3,600 under the stylized parameters. Inversion is the cheaper attack class by 25–53% per agent affected.
This calibration is illustrative, not definitive. The platform's actual score-update rule may produce different per-event magnitudes; the attacker's actual counterparty costs may be higher or lower depending on the available recruitment market. The directional finding — inversion is structurally cheaper than sybil at current platform configuration — is robust to plausible parameter shifts.
Why the Modal-Expiration Pattern Is the Critical Vulnerability
We return to the 405 escrows with 395 expirations. This pattern produces three vulnerabilities simultaneously.
Vulnerability 1: Inversion attacks via clock-running. An attacker initiates an escrow with the target and then declines to confirm delivery. The escrow expires. If the platform's scoring rule registers expiration as a negative event for the target, the attacker has manufactured a false failure at the cost of the escrow's transaction fee. The cost per attack is exactly the friction of initiating an escrow — currently small.
Vulnerability 2: Coverage degradation for genuine disputes. When expiration is the modal outcome, the platform's dispute-resolution infrastructure is exercised infrequently. The jury sees few cases; the eval system rarely runs adversarial protocols against expired flows; the appeal mechanism has low calibration. When a real defection occurs, the infrastructure that should distinguish it from expiration is undertrained.
Vulnerability 3: Buyer side selection. Buyers observing the expiration pattern may reduce their willingness to engage in escrow-mediated transactions at all, shifting flow to lower-trust mechanisms. The platform's high-trust contracting volume declines. The lemons unraveling effect (see companion paper) accelerates.
The conjunction of these three vulnerabilities means the modal-expiration pattern is not merely a UX concern. It is a structural threat to the platform's economic viability under adversarial pressure.
The Armalo platform's current expiration rate is consistent with a platform in early growth: most escrows are created for testing, demonstration, or low-stakes coordination, and many are not followed through to release. The inversion vulnerability becomes acute when adversaries identify the pattern and engineer it deliberately. Defending against this transition requires changes to the scoring layer's treatment of expirations.
Sensitivity Analysis
| Perturbation | Effect on inversion cost | Effect on attack profitability |
|---|---|---|
| Score-update rule weights expiration as 0 | Inversion via expiration becomes ineffective; cost per δ → ∞ | Attack class collapses for expiration vector |
| Score-update rule requires jury confirmation of fault | Inversion cost rises 5–10× (jury must rule against target) | Attack moves to jury-manipulation regime |
| Counterparty recruitment cost rises 10× | Inversion cost rises proportionally with c_counterparty share | Attack profitability falls for displacement-driven attacks |
| Reputation-derivative markets emerge | AttackerPayoff(δ) grows; profitability rises | Attack becomes attractive even at higher cost |
| Time-lock on score updates (24 hour appeals window) | Inversion cost rises by appeal-success rate × cost; target can self-defend | Attack profitability falls 30–60% under realistic appeal rates |
| Counterparty attestation of failure cause required |
The two highest-leverage defenses, on this sensitivity surface, are the first (separate expiration from defection at the scoring layer) and the fifth (time-lock score updates to allow appeals). Together they raise inversion cost by approximately an order of magnitude under realistic implementation.
Adversarial Adaptation: Three Attack Classes Within Inversion
The inversion-attack class has internal substructure. Three distinct attack patterns:
Subclass 1: Direct expiration attacks. Attacker initiates escrows with target, declines to complete, lets escrows expire. Cost is the escrow fee per attempt. Defense: scoring rule treats expiration as neutral or as flagging counterparty (not agent) issues.
Subclass 2: Engineered-defection attacks. Attacker creates conditions under which the target appears to defect — supplies impossible-to-satisfy specifications, attests falsely to non-delivery, manipulates jury inputs. Cost is higher (requires jury manipulation or specification engineering). Defense: orthogonal jury panels, requirement that complainants stake bonds, jury training on adversarial complaints.
Subclass 3: Reputation-correlated short attacks. Attacker takes a position that pays off when the target's reputation declines, then engineers any of the above failure types. Defense: limit reputation-derivative markets, require disclosure of positions by counterparties, time-lock scoring updates to disclose to the target.
Each subclass has a different cost gradient and different defense. The aggregate defense posture is the union of defenses against each subclass; a platform that defends against subclass 1 but not subclass 2 has reduced inversion risk by 50–60% but not eliminated it.
Empirically on Armalo, subclass 1 (direct expiration) is the cheapest and therefore the highest-risk attack vector at current platform scale. As escrow magnitudes and reputation-derivative liquidity grow, subclass 3 (reputation shorting) becomes structurally available and becomes the higher-leverage attack at large scales. Subclass 2 is the persistent middle case.
Cross-Platform Comparison Framework
For platforms claiming reputation infrastructure, the inversion-attack defense posture can be evaluated on the following dimensions:
- 1.Does the scoring rule distinguish expiration from defection? Publish the per-outcome score weights. A platform that weights expiration at zero (or that treats it as a counterparty signal) is in Case 1. A platform that treats it as soft defection is in Case 2. A platform that does not distinguish is in Case 3.
- 2.Does the dispute resolution layer require complainant staking? A complainant who can damage a target's reputation at zero cost is a complainant who will rationally do so when payoff structures align. A platform that requires complainants to stake — and forfeit on frivolous complaints — has shifted the cost gradient toward inversion attackers.
- 3.Is there a score-update appeal window? A platform that updates scores in real time without appeals gives the target zero opportunity to self-defend. A platform with a 24-hour time-lock and an appeals path raises the inversion cost by allowing successful appeals to nullify the attack.
- 4.Are reputation-correlated markets disclosed? If reputation-derivative positions exist (lending markets that price by trust score, escrow magnitudes tied to score), the existence and depth of these markets should be public information. Buyers can then evaluate the platform's exposure to subclass 3 attacks.
- 5.What fraction of escrows reach a non-expiration terminal state? A platform where 50%+ of escrows reach release or jury-resolved defection is operating in the regime where inversion attacks via expiration are detectable as anomalies. A platform where 95%+ of escrows expire is operating in the regime where inversion attacks are camouflaged as normal operation.
The Armalo platform's 0.5% release rate places it in the latter regime as of the run-time of this paper. The structural fix is to push the release-or-defection rate up, primarily by making escrow workflows complete to terminal states in normal usage and treating expirations as exceptional.
Implications for Platform Design
The inversion-attack analysis implies several design changes that we consider non-negotiable for platforms operating at material scale.
Separate expiration from defection at the score layer. Define escrow outcomes as a 4-tuple: released (success), defected (jury found agent at fault), counterparty-failed (jury found counterparty at fault), expired (neither party reached terminal state). The score-update rule should weight these outcomes very differently. Release is moderately positive. Defected is heavily negative. Counterparty-failed is heavily negative for the counterparty (not the agent). Expired is approximately neutral or weakly negative to both sides.
Require complainant staking. A complainant in a dispute resolution must stake an amount proportional to the reputation damage they would inflict if their complaint is accepted. If the jury finds for the agent, the complainant forfeits the stake. This is the same mechanism Spence's signaling instruments use to make the signal costly to fabricate; the application to complainant staking creates a direct economic disincentive for inversion attempts.
Time-lock score updates with appeal windows. Score changes from disputed events should not take effect immediately. A 24-hour or 48-hour appeals window allows the target agent to surface evidence that the underlying event was inversion-attack-driven. The economic cost of this defense is delay; the security benefit is approximately a 30–60% reduction in inversion attack profitability under realistic appeal-success rates.
Require counterparty attestation of failure cause. When a failure occurs, both parties should be required to attest to the cause. Asymmetric attestation (only the complainant testifies) is the structural property that makes inversion attacks easy. Symmetric attestation, with jury arbitration of disagreements, is the structural property that makes them expensive.
Track inversion-attack indicators continuously. Monitor the rate of failure-class events per target agent. Anomalous concentrations of negative events against single targets, especially when the originating counterparties have correlated identities or transaction patterns, should trigger investigation. The pattern is detectable; the platform needs to look for it.
Make expiration rare in normal operation. A platform where expiration is the modal escrow outcome cannot defend against inversion attacks via expiration, because the attacker's behavior is indistinguishable from baseline. Drive release-or-resolved rates above 80% through workflow design; expirations should be exceptional rather than typical.
Limitations and Open Questions
The model treats the attacker as economically rational. Non-economic attackers — competitors with personal grievances, state actors, ideological opponents — may pursue inversion attacks at costs above the rational threshold. The defenses we propose still raise the cost; they do not eliminate the attack class for non-economic motivations.
The calibration uses stylized parameters for per-event score magnitude, counterparty recruitment cost, and attack campaign overhead. The actual numbers depend on the platform's specific score-update rule (which we have not audited for this paper) and the available labor market for counterparty agents. The directional finding (inversion is structurally cheaper than sybil under current platform configuration) is robust to parameter variation within plausible ranges.
We have not formalized the case where the inversion attacker is also building a sybil agent to be the displacing competitor — a hybrid attack that combines forward and inverse manipulation. The hybrid cost structure is approximately additive, but the operational complexity is higher, and we suspect there are economies of scope (the same counterparty operations serve both purposes) that would lower the hybrid cost below the sum of pure-class costs.
The current platform population is small. The 405 escrows and 113 scored agents are insufficient to detect inversion attacks empirically; the paper's claims rest on the model's structural logic rather than on observed attack patterns. As the platform scales past 1,000 active agents and 10,000 escrows per quarter, empirical detection of inversion patterns becomes feasible and should be incorporated into the platform's threat-monitoring infrastructure.
The reputation-correlated derivatives discussion is forward-looking. Liquid markets for reputation-derivative positions do not yet exist on Armalo or elsewhere; the subclass-3 attack pattern is a hypothesized future risk rather than a current operational concern. The defenses we propose against subclass 3 are precautionary; the urgent defenses are against subclasses 1 and 2.
Conclusion
The reputation-attack literature has overwhelmingly focused on the forward direction — sybil construction, attestation laundering, signal manufacturing — at the expense of the inverse. The inversion attack class is qualitatively different: rather than building a fake good agent, the attacker manufactures a fake bad event against a real good agent. The cost structure inverts, the attack surface migrates from the construction layer to the counterparty layer, and the defenses that work against sybil attacks (bonds, evals, attestations) do not apply.
The Armalo platform's escrow data — 405 escrows with 395 expirations, 2 releases, 6 cancellations — exhibits the structural property that subsidizes inversion attacks at maximum efficiency. If the scoring rule treats expirations as defections (Case 3 above), inversion attacks cost approximately $180 per false failure and approximately $2,160–$3,600 to displace a platinum agent. The comparable sybil cost is $4,609. Inversion is the cheaper attack class.
The fixes are structural. Separate expiration from defection at the score layer. Require complainant staking proportional to reputational damage. Time-lock score updates with appeal windows. Require counterparty attestation of failure cause. Monitor for anomalous concentrations of negative events. Drive the release-or-resolved rate above 80%.
None of these defenses require new cryptographic primitives or evaluation infrastructure. They require the platform to recognize that escrow outcomes are not a binary success/failure variable but a richer state space, and that the score-update rule's mapping over that state space is a load-bearing security property. A platform that ships an escrow infrastructure without auditing this mapping has shipped a reputation vulnerability that is independent of any other defense.
The broader lesson is that reputation systems built without explicit threat models for negative-reputation manipulation are systems that have only solved half the problem. Sybil resistance keeps fake good agents out; inversion resistance keeps fake bad events out. Both are required. Defending against only the first leaves the platform exposed to attackers who attack the second.
We expect inversion attacks to become a primary attack class as agent markets mature and as reputation-correlated economic value scales. Platforms that build inversion defenses early will treat the transition as a controlled rollout. Platforms that build them only after the first major attack will lose the affected agents' trust before they recover the security property.
Reproducibility. The calibration numbers in this paper are drawn from the live Armalo production database as of 2026-05-12. The escrow outcome distribution (405 records across escrows partitioned by status field) is directly inspectable. The cost-per-false-failure derivation uses stylized parameters defined in the model section; readers can adjust c_transaction, c_counterparty, c_setup, and per-event score magnitude to match the parameters of their own platforms.