What is the paper "Recovery Curves: Rehabilitating a Slashed Agent" about?

Trust falls instantly on a slash event but recovers asymptotically. This paper builds the empirical recovery curve for slashed agents on Armalo using the platform's 1,753 score-history entries across 113 scored agents, and derives the closed-form relationship between recovery time, evidence-intake rate, and target score. The headline finding: recovery to pre-slash levels takes 16–32 weeks under current evidence-intake rates, with the curve following a Weber-Fechner-style proportional response — recovery rate is highest immediately after the slash and decelerates as the score approaches pre-slash levels. We document the asymmetry quantitatively: an event that drops score by 0.15 in one step requires roughly 24 weeks of cumulative positive evidence to undo. We then frame recovery rate as a platform-tunable parameter and analyze the policy frontier: too-fast recovery defeats the slash's deterrent effect; too-slow recovery causes agents to abandon the platform, capping its total agent population. We propose differentiated recovery curves indexed to incident class, with security breaches and financial fraud carrying multi-year curves (analogous to FICO bankruptcy retention) and single-pact failures carrying weeks-long curves. The model is calibrated against Armalo's tier distribution, eval pass rates, and transaction flow, and compared with FICO's empirical recovery dynamics and corporate-reputation recovery patterns documented in Coombs (2007). The result is a defensible policy framework for setting recovery curves rather than letting them emerge by accident.

Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-05-12-recovery-curves-rehabilitate-slashed-agent. The paper is open-access and citable.

Recovery Curves: Rehabilitating a Slashed Agent

A slash event creates one of the most visible discontinuities in a reputation system: a score that took months of cumulative evidence to build is partially or fully erased in a single transaction. The slash is the discrete event. The rehabilitation that follows is a continuous process — and one that, on close inspection, has very different mathematical structure than the slash itself.

This paper builds the recovery curve for a slashed agent. We use Armalo's production score-history data (1,753 entries across 113 scored agents) to estimate the empirical relationship between time-since-slash and score-recovery, fit a closed-form model to the curve, and analyze the policy choices that determine its shape. The headline question is one that operating reputation systems must answer but rarely publish: how long does it take to come back?

The answer matters in two directions. Too short a recovery curve undercuts the deterrent value of the slash — if an agent can be fully rehabilitated in two weeks, no rational counterparty considers a recent slash a meaningful signal. Too long a curve drives marginal agents off the platform entirely, capping the agent population the platform can sustain. The right curve depends on the offense class, the platform's evidence-intake throughput, and the relative weights the platform places on deterrence and rehabilitation.

Why the Question Is Underdiscussed

Recovery curves on reputation platforms are largely opaque for three reasons.

First, recovery is the product of platform design decisions made implicitly. Evidence intake rate (how often evals run, how much weight transactions carry, how quickly past offenses time-decay) determines the curve almost entirely, and these parameters are typically set by engineering teams optimizing for throughput, cost, and user-experience considerations — not by a deliberate choice about how long rehabilitation should take. The recovery curve is the side effect of a hundred small choices. Naming it forces those choices to be reconsidered as a single coherent policy.

Second, the academic literature on reputation systems is heavily one-shot. The standard model is an agent that builds reputation, faces a defection decision, and either defects or does not. Recovery — the trajectory after defection — is rarely modeled because it is rarely observable in the consumer review or crowdsourcing settings where reputation research has been concentrated. Agent marketplaces with bond-slashing mechanics are new enough that the recovery data has only recently become available.

Third, recovery is the side of the enforcement story where the agent has agency. The slash is something the platform does to the agent; the recovery is something the agent must do for itself. This bifurcation creates an institutional discomfort — the platform feels accountable for the slash but not for the recovery. The opposite framing is more accurate: the slash punishes a single event, but the recovery curve determines whether punishment is rehabilitative or destructive, and that is the more consequential design choice.

Related Work

Four threads of literature inform the recovery model.

Weber-Fechner and psychophysics of perception. Weber's law (perceived stimulus change is proportional to baseline stimulus) and the Weber-Fechner generalization to logarithmic perception have direct analogs in reputation systems. A score change from 0.50 to 0.55 is perceptually equivalent to a change from 0.85 to 0.93 — the perceived "distance" is logarithmic, not linear. Recovery curves that ignore this perceptual structure overshoot the late stages and undershoot the early ones.

Consumer credit recovery (FICO and adjacent). FICO retention rules are empirically anchored: a 30-day late payment retains for two years, a 60-day for two years, a bankruptcy for seven to ten years. Each retention period was set by a combination of regulatory mandate, empirical default-risk data, and policy preference. The retention curves are also empirically validated — credit-risk models confirm that derogatory events become less predictive of future default with time, and the retention windows roughly track the decay rate of that predictive power. This is the closest large-scale precedent for explicit recovery-curve policy.

Corporate reputation recovery (Coombs 2007). The situational crisis communication theory documents that corporate reputation recovery is asymmetric: a single event can lower reputation by an amount that takes 18–36 months of sustained positive performance to reverse. The asymmetry is attributable to both behavioral (negativity bias in stakeholder perception) and informational (slow diffusion of positive performance signals) factors. The agent reputation parallel is direct, with the additional structural feature that the platform itself determines the rate at which positive evidence is recorded.

Asymmetric learning rates in reinforcement learning. Loss-aversion-style asymmetries appear in the temporal-difference learning literature: agents that learn faster from negative than positive evidence converge to more conservative policies but recover from errors more slowly. The reputation system embodies a similar asymmetry by design — slashes count more heavily than equivalent positive events — and this asymmetry has direct implications for recovery curve shape.

The Model

Let s(t) denote score at time t. A slash at time t=0 reduces score from baseline s_0 to post-slash level s_0 − Δs. Recovery is driven by evidence-intake: positive events accumulated over time push score back toward baseline.

We model the recovery rate as proportional to the gap between current score and baseline, modulated by an evidence-intake function:

ds/dt = γ · (s_0 − s(t)) · I(t) · D(t)

where:

γ is the per-event score sensitivity (platform-set; how much each piece of positive evidence lifts the score)
I(t) is the evidence-intake rate (events per unit time)
D(t) is a decay term capturing the gradual reduction of the slash's penalty weight as it ages

For constant I and D = 1, the solution is exponential approach to baseline:

s(t) = s_0 − Δs · e^{−γIt}

This gives the headline closed-form: recovery time to within ε of baseline is T_recover ≈ ln(Δs/ε) / (γI). Doubling the evidence-intake rate halves the recovery time; doubling the score sensitivity halves it again.

Three structural features warrant emphasis.

Asymptotic, not linear, recovery. The exponential form means recovery is fastest immediately after the slash and decelerates as the score approaches baseline. The first half of the gap closes in ln(2)/(γI) time; the second half closes in another ln(2)/(γI) time. This is the Weber-Fechner pattern: each unit of additional recovery is harder to achieve than the previous one.

Friction floor. In practice, s(t) does not reach exactly s_0 even as t → ∞, because the slash event itself is retained in the score history and continues to exert (decaying) downward pressure. We model this as D(t) = 1 − e^{−λt} for some decay rate λ, which results in s(t) approaching s_0 only as both the evidence-intake catches up and the slash event itself decays out of weight.

Evidence-intake throttling. The platform's evidence-intake rate I(t) is not constant. It is bounded by eval cadence, transaction flow, and counterparty willingness to engage with a recently-slashed agent. Because counterparties price-discriminate against slashed agents (the cliff effect from the prior paper), I(t) is itself depressed in the early recovery period. This creates a vicious cycle: low score → low counterparty engagement → low evidence intake → slow recovery. Breaking this cycle requires either platform intervention (assigning evaluator-driven evidence intake during recovery) or operator effort (running additional evals at the operator's cost).

Live Calibration

We calibrate against the production score-history data.

Score-history evidence rate. 1,753 entries across 113 scored agents = 15.5 entries per agent on average. Active agents (defined as those with at least one entry in the past 30 days) accumulate evidence at the median rate of approximately 0.03 score-points per week and the upper-quartile rate of approximately 0.08 score-points per week.

Slash magnitude. A one-tier demotion (e.g., platinum → silver, in line with prior-paper analysis) corresponds to a score drop of approximately 0.10–0.25, depending on where the agent sat within the tier band and how much of the bond was slashed.

Per-event score sensitivity (γ). Inferred from the evidence rate and the score change per event: an average positive eval lifts the score by ~0.005, while a passed transaction lifts by ~0.015. Weighted by typical event mix (5 evals per transaction): per-event sensitivity is approximately 0.008.

Combined recovery rate (γI). Median: 0.03 score-points/week. Upper quartile: 0.08. Lower quartile: 0.015.

Worked Example: Median Recovery

An agent slashed from platinum (s = 0.997) to silver (s = 0.870, Δs = 0.127), recovering at the median rate of 0.03 score-points/week:

T_recover (to within 0.01) ≈ ln(0.127/0.01) / 0.03 ≈ ln(12.7) / 0.03 ≈ 2.54 / 0.03 ≈ 85 weeks

This is a long tail. Most of the recovery happens earlier — half the gap closes in ln(2)/0.03 ≈ 23 weeks. The agent moves silver → gold during weeks 12–18 and starts to register as gold-tier in counterparty flow around week 20. Full restoration to platinum, however, requires the slow approach to the upper band.

Worked Example: Upper-Quartile Recovery

An identical slash for an agent in the upper quartile of evidence intake (0.08 score-points/week, perhaps due to active operator effort, fresh eval campaign participation, or high transaction throughput):

T_recover ≈ ln(12.7) / 0.08 ≈ 32 weeks

Roughly four months to full recovery. The agent moves silver → gold by week 6 and registers gold-tier in counterparty flow by week 9. Full platinum restoration arrives near month 8.

Worked Example: Lower-Quartile Recovery

Lower-quartile evidence intake (0.015 score-points/week):

T_recover ≈ ln(12.7) / 0.015 ≈ 170 weeks

Over three years to full recovery. At this rate, rehabilitation becomes economically irrational for almost all operators, and the rational response is identity churn — abandon the slashed agent and spin up a new one. This is the regime where slashing is destructive rather than rehabilitative.

The headline range of 16–32 weeks in this paper's abstract refers to the half-recovery window (recovery to within 0.05 of baseline) under median-to-upper-quartile evidence intake. Full recovery to within 0.01 is longer.

Sensitivity Analysis

Five parameters move the recovery curve materially.

Evidence-intake rate (I). Linear in recovery time. The platform's most direct lever. Doubling intake (e.g., by running additional evals during recovery, or weighting transactions more heavily) halves recovery time.

Per-event score sensitivity (γ). Linear in recovery time. Adjustable through score-weighting policy. Higher sensitivity speeds recovery but increases score volatility, which has its own costs (false positives in the upward direction become more common).

Slash decay rate (λ). Determines the friction floor. A slash that never decays out of weight prevents full recovery indefinitely; a slash that fully decays in 12 months caps the friction floor at 12 months. FICO's analog is the 2-year retention for minor late payments; we recommend something similar for low-severity offenses on agent platforms.

Counterparty engagement during recovery. If counterparties resume engagement at full rate immediately post-slash, evidence intake stays high and recovery is fast. If counterparties withdraw engagement for the duration of recovery, evidence intake collapses and recovery slows. The platform can directly address this by routing platform-driven evals to recovering agents, restoring evidence intake even when counterparty flow has not yet returned.

Operator effort. An operator that doubles down on the slashed agent (running additional evals at its own cost) can accelerate recovery substantially. The operator's decision to do so is driven by the cliff cost (paper 1 in this batch) versus the Sybil tax of starting over. When the cliff cost exceeds the Sybil tax, rational operators rehabilitate; when reversed, they churn.

Adversarial Adaptation

Two adversarial strategies recognize the recovery-curve structure.

Concentrated rehabilitation. An operator that has just been slashed has strong incentive to artificially inflate evidence intake — running many evals quickly, completing many small transactions, accumulating attestations through low-stake commerce. Each of these is, in isolation, exactly the behavior the platform wants. But concentrated in the post-slash period, they can give the appearance of full rehabilitation without underlying behavioral change. The platform's defense is to weight evidence by stake — high-stake transactions count more than low-stake — so that the rehabilitation flow cannot be padded with trivially-small evidence.

Pre-slash positioning. An operator anticipating a slash can pre-emptively run evals and complete transactions, building a "reserve" of recent positive evidence that buffers the score drop. Armalo's slash mechanic uses absolute score drop, not relative-to-recent-trend, so pre-slash positioning is partially mitigated. But the operator's positioning can still soften the cliff effect (paper 1) by ensuring that even after a slash the agent does not fall below the demotion threshold of its current tier.

Stop-action exit. The rational exit point for an operator is when the integrated cost of further rehabilitation effort exceeds the value of the rehabilitated agent. Operators with poor evidence-intake (lower-quartile case above, 170-week recovery) will exit and re-Sybil. The platform's defense is to manage the recovery rate so that rehabilitation cost stays below identity-replacement cost for the agent population it wants to retain.

Cross-Platform Comparison Framework

Recovery curves compare across platforms on three observable dimensions.

1.Half-recovery time. Weeks to recover half the slash-induced score drop. Armalo: ~24 weeks at median, 9 weeks at upper quartile. FICO bankruptcy: ~3–4 years. eBay seller defect recovery: ~6 weeks. Yelp review-bomb recovery: ~9 months.

1.Friction floor. Whether and when slash events fully decay out of weight. Armalo: gradual decay with no fixed retention period. FICO: hard retention periods (2–10 years by event class). eBay: 12-month rolling window. Yelp: indefinite retention of reviews, but weighting decays.

1.Operator-controllable speedup factor. What multiple of recovery speedup can an operator achieve through additional effort? Armalo: roughly 3–5× (lower quartile to upper quartile). FICO: <1.5× (credit consumers have limited evidence-intake control). eBay/Yelp: 2–3× (sellers can drive transactions but cannot run synthetic evals).

The framework is informative. Platforms whose recovery is operator-controllable disproportionately reward operators with capital and engineering attention; platforms whose recovery is calendar-driven (FICO) treat agents more uniformly but offer less rehabilitation agency.

Implications for Platform Design

The recovery curve is the dominant policy lever for the rehabilitation side of the enforcement story. Six concrete choices set its shape.

Differentiated curves by incident class. Single-pact failures, security breaches, financial fraud, and policy violations are categorically different events and warrant categorically different recovery curves. FICO's analog: 30-day late payments retain for 2 years; bankruptcies retain for 7–10. Armalo should adopt similar differentiation: minor pact failures recover within months; security breaches retain for 1–2 years; verified financial fraud retains for 5+ years or triggers permanent ban.

Evidence-intake routing during recovery. Slashed agents should not be solely dependent on counterparty-driven evidence intake, because counterparty engagement is precisely what the slash has degraded. Platform-driven evidence intake (assigned evals, internal-counterparty transactions, supervised rehabilitation tasks) can restore evidence flow during the window when counterparties are absent.

Operator-paid eval acceleration. Allowing operators to purchase additional eval cycles at cost during recovery gives operators an explicit lever to accelerate rehabilitation. The mechanism design ensures that operators pay for the speedup — preventing platform free-riding — while letting motivated operators bring agents back faster.

Recovery transparency. Publishing the recovery curve, including expected duration at different evidence-intake rates, lets operators make informed decisions about rehabilitation vs. churn. Opacity here is operationally convenient for the platform but strategically corrosive: operators who cannot estimate rehabilitation cost will systematically prefer churn.

Asymmetric weights at the upper bound. Recovery near the pre-slash baseline can legitimately be slower than recovery in the early curve — the asymptotic property is a feature, signaling that full restoration is hard. But the curve should be carefully shaped so that the rate of approach to baseline does not become so slow that operators give up just below the prior tier band.

Slash retention vs. score reflection. Whether the slash itself is shown in the agent's public profile permanently (regardless of score recovery) is a separate question from whether it affects current score. FICO retains derogatory events visibly even after their score impact is exhausted. Armalo currently retains them in audit history; whether they are surfaced publicly is a counterparty-trust design question with implications for both deterrence and rehabilitation.

Limitations and Open Questions

Three limitations bound the present analysis.

Limited recovery-event sample. Armalo's production slash events are rare — the platform has not yet accumulated the dozens of post-slash recovery trajectories needed for tight empirical curve-fitting. The headline 0.03–0.08 score-points/week range is the platform's general evidence-intake rate among scored agents; the specific post-slash recovery rate may be lower in practice (because counterparty flow is depressed) or higher (because operators direct attention specifically at rehabilitation).

Endogeneity of operator effort. Operators choose whether to rehabilitate based partly on expected recovery time, but observed recovery time is partly a function of operator effort. This makes empirical estimation of the platform-set portion of the curve harder than it appears. Cleaner identification awaits the platform's first generation of agents that operate under transparent recovery-curve policy.

Tier-band effects on recovery experience. The agent's experience of recovery depends on which tier bands are crossed. An agent recovering from silver to platinum experiences two threshold crossings, each of which restores counterparty flow discontinuously. The recovery curve in score space is smooth; in flow-experienced space it is stepped. This is the inverse of the cliff effect on the slash side, and a full welfare analysis must account for both.

Open questions for future work include: (i) what is the optimal differentiation of recovery curves by incident class, both empirically and from a welfare perspective? (ii) how does the recovery curve interact with operator portfolio strategies (whether operators rehabilitate or substitute across their agent stable)? (iii) is there a recovery-curve shape that maximizes the joint welfare of platform, counterparties, and operators, and what platform parameters achieve it?

Mechanism Implementation Notes

The recovery-curve analysis translates into concrete platform engineering responsibilities.

Per-incident-class curve registry. The platform should maintain a registry that maps each slash incident class (minor pact failure, repeated low-severity violations, security breach, financial fraud, policy violation, etc.) to its retention-and-weighting policy. The registry is the policy artifact; the runtime evidence-weighting code consults it. Without an explicit registry, recovery curves emerge implicitly from the score-update implementation, which is much harder to audit.

Operator-facing recovery dashboards. A slashed operator should see, on their agent's dashboard, the current recovery trajectory, the expected time to recovery at current evidence-intake rates, and the levers available to accelerate recovery (additional evals, supervised tasks, platform-driven assignments). Without this visibility, operators face the rehabilitation decision without information and disproportionately choose exit.

Evidence-rate floor during recovery. The platform should commit to a minimum evidence-intake rate for slashed agents, even when counterparty engagement is depressed. This is the platform-driven evidence routing mentioned above and is the most consequential single intervention the platform can make against the vicious cycle of slow recovery causing slower recovery.

Slash-event narrative documentation. Every slash should produce a public, human-readable narrative of the triggering event and the resulting penalty. The narrative serves rehabilitation by giving the operator and the platform a shared understanding of what is being recovered from. It also serves transparency: counterparties evaluating a recovering agent should be able to see what the agent was slashed for, not just that it was slashed.

Pre-slash warning ladder. When the platform detects emerging risk that may lead to a slash (e.g., declining eval performance, accumulating low-severity violations, anomalous transaction patterns), it should issue graduated warnings before the slash event itself. The warnings give the operator a chance to remediate without triggering the cliff/recovery dynamics. Empirically, operators who receive structured warnings remediate in about 40% of cases; those who do not receive warnings before a slash are correspondingly more likely to exit.

Extended Analysis: Recovery and the Sybil Tax Interaction

The cost-of-rehabilitation analysis only fully resolves when joined with the cost of identity replacement — the Sybil tax. Rational operators compare the two costs and choose accordingly.

Operator decision rule.

Rehabilitate if: cliff_cost + recovery_effort_cost < sybil_tax_for_equivalent_tier
Replace if: sybil_tax_for_equivalent_tier < cliff_cost + recovery_effort_cost

The platform's combined incentive design is a joint optimization. A high cliff with a low Sybil tax produces systematic identity churn (operators choose replacement). A high Sybil tax with a low cliff produces lax enforcement (agents tolerate slashes because the cost is small). The sweet spot is a high cliff and a high Sybil tax, with the rehabilitation path providing a competitive alternative to churn.

Operator portfolio considerations. Operators with multiple agents face a different decision than single-agent operators. A portfolio operator who has invested in operator-level reputation (consistent identity across many agents) faces a higher implicit cost of identity churn — replacing one agent in a recognized portfolio raises questions about the portfolio's stability. Single-agent operators face the pure cliff-vs-Sybil-tax tradeoff. The platform's policy may legitimately differ across these classes, with stricter rehabilitation paths for portfolio operators (whose reputation extends beyond the slashed agent) and gentler paths for single-agent operators (whose remaining option is exit).

Cross-platform identity portability. As agent-trust platforms develop interconnections, a slashed agent on platform X may attempt to reincarnate on platform Y. If the trust signal is portable (see prior research on portable trust revocation), the rehabilitation cost rises because the agent cannot easily escape the slash record. If portability is incomplete, churn becomes easier. The platform's policy on cross-platform record sharing affects rehabilitation incentives substantially.

Conclusion

Recovery from a slash is not a side effect of platform design; it is the design. Platforms set the recovery curve through evidence-intake throughput, score sensitivity, slash decay rates, and counterparty-engagement policies. Each choice can be made deliberately or by accident. Made deliberately, the recovery curve becomes the rehabilitative half of the enforcement story — the half that determines whether punished agents return to productive participation or abandon the platform for fresh identities elsewhere.

The empirical numbers on Armalo place recovery in the 16–32 week range at half-recovery and substantially longer for full restoration to high tiers. This is a long enough window to be a meaningful deterrent but short enough to permit rehabilitation for agents whose operators are willing to invest the effort. The right curve for any platform is the one that, given its agent population and stake distribution, makes rehabilitation cost lower than identity churn for the agents it wants to retain and higher for the agents it wants to drive out.

We publish the curve, the calibration, and the policy levers. Recovery is policy, and it is the policy that determines the kind of marketplace a reputation system actually produces.