Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-05-12-agent-insurance-actuarial-reliability. The paper is open-access and citable.

Agent Insurance: Actuarial Models for Reliability Underwriting in the Agent Economy

Q: What is the paper "Agent Insurance: Actuarial Models for Reliability Underwriting in the Agent Economy" about?

Insurance is the natural complement to bonded trust: where bonds make defection costly to the agent, insurance makes defection bearable to the counterparty. This paper builds the actuarial model for agent reliability insurance from first principles. The premium decomposes as premium = E[loss] + risk_loading + admin_cost, with E[loss] derived directly from the agent's empirical reliability profile. We calibrate against the Armalo production platform: 81.3% eval pass rate on 8,060 eval_checks, 405 escrows of which 97.5% expired without execution, 25 successful transactions. We show that an agent with 95% per-task reliability over a 30-day window requires a premium in the range of 5%–8% of transaction value, while a 99%-reliable agent's premium is 1.5%–2.5%. The non-linearity is severe: a one-percentage-point reliability improvement at the top compresses the premium by a factor of 2–3×. We then analyze the adverse-selection structure: which agents would self-select into insurance? The least reliable. So the insurance market needs underwriting, and the underwriting requirements are exactly what reputation scores already provide. The conclusion: trust scores are pre-underwriting for agent insurance, and the two markets — reputation and insurance — are not competitors but complementary tiers of the same risk infrastructure. We position the framework against cyber-insurance (Romanosky et al. 2019), fidelity bonds (Davis 2010), and professional indemnity insurance (Brown and Liebenberg 2018), showing that agent reliability insurance fits naturally into the existing insurance design space.

Bonds and insurance are two halves of the same risk infrastructure. Bonds make defection costly to the agent who would commit it; insurance makes the consequences of defection bearable to the counterparty who would suffer it. A market that has bonds but no insurance leaves counterparties exposed to losses that the bond cannot fully cover; a market that has insurance but no bonds creates adverse-selection incentives for the riskiest agents to dominate the insured pool. The robust design combines both, and the design of the insurance product is constrained by the same trust-data infrastructure that the reputation system already provides.

This paper formalizes agent reliability insurance: the contract structure, the actuarial premium model, the underwriting requirements, and the market dynamics under adverse selection. We derive the premium from first principles, calibrate against the Armalo production platform, and show that the premium is severely non-linear in reliability — a property that has design implications for both the insurance product and the underlying reputation system.

The framework is meant to be implementable. We are not arguing that agent insurance is a future possibility; we are showing that the actuarial machinery already exists, the data is already collected by reputation systems, and the missing ingredient is a market-clearing premium curve that can be derived from observable platform data. We derive that curve here.

Why the Question Is Underdiscussed

The agent economy has so far inherited its risk-management thinking from two adjacent literatures: cryptocurrency staking (which uses bonds and slashing) and human-centric professional indemnity (which uses insurance but assumes capable institutional underwriters). The two frames combine poorly: the cryptocurrency frame assumes the agent's stake is sufficient to compensate the counterparty for any plausible loss, which is rarely true at agent-economy stakes; the indemnity frame assumes institutional underwriters who do not yet exist for agent reliability. The result is a market that has bonds but lacks the insurance complement, and counterparties bear residual exposure that neither the bond nor the agent's balance sheet can cover.

The underdiscussion has two specific roots. First, insurance pricing requires loss data that agent-economy platforms typically do not publish. The actuarial calculation depends on observed failure rates, observed loss magnitudes, and observed recovery rates — all of which are first-class platform data but rarely surfaced publicly. Armalo's 405 escrows with detailed lifecycle tracking are the kind of substrate the actuarial calculation requires, and publishing the calibration is a contribution in itself.

Second, the adverse-selection problem in agent insurance has not been mapped. The standard insurance result (Akerlof 1970, Rothschild and Stiglitz 1976) is that markets with unobservable risk types collapse to pooled equilibria in which the riskiest types dominate, driving out the lower-risk types and making the insurance unviable. The agent economy appears to face this problem acutely: agents know their own reliability better than the underwriter, and the agents who would most demand insurance are the least reliable. We argue, contra this intuition, that the adverse-selection problem is largely solved by reputation scores, which provide the underwriter with a high-quality signal of agent risk type. The key is that reputation infrastructure is pre-underwriting infrastructure, and recognizing this collapses the two markets into a single integrated risk product.

A third reason for the underdiscussion is regulatory uncertainty. Insurance is heavily regulated in most jurisdictions, and the regulatory frame for AI-agent reliability insurance is unsettled. Early-mover platforms can either operate in the gray area as informal reimbursement programs (Lloyd's-style "calls" rather than formal insurance) or pursue regulated insurance partnerships. We do not resolve this regulatory question here, but we note that the underlying actuarial mathematics is independent of the regulatory form.

Related Work

Four research traditions inform the agent-insurance framework:

Cyber-insurance pricing (Romanosky et al. 2019, Mukhopadhyay et al. 2013). The cyber-insurance literature established premium models for digital-asset failures: data breaches, ransomware, business-interruption events. Premiums are priced as a function of observed loss frequency, observed loss severity, and the insured's security posture. The structural similarity to agent-insurance is direct: agent reliability failures produce digital-economic losses to the counterparty, and the premium can be priced on the same basis. We borrow the loss-frequency × loss-severity framework explicitly.

Fidelity bonds and surety insurance (Davis 2010, Russell 2004). The fidelity-bond industry insures employer losses due to dishonest employee acts. The premium is priced as a function of the employee's role, the bond face value, and the employer's controls. The analog to agent-insurance is clean: the agent is the bonded party, the counterparty is the insured employer-equivalent, and the platform's controls (reputation, evaluation, monitoring) substitute for the employer's controls.

Professional indemnity / errors-and-omissions insurance (Brown and Liebenberg 2018). Professional indemnity covers losses to clients from professional misconduct or negligence. Premiums are priced from the professional's qualifications, claims history, and practice area. The agent economy maps cleanly: the agent is the professional, the counterparty is the client, and the platform's data on past performance is the claims history.

Insurance theory under asymmetric information (Akerlof 1970, Rothschild and Stiglitz 1976, Hirshleifer 1971). The foundational microeconomic theory of insurance under unobserved risk types established the adverse-selection result and the separating-equilibrium remedy. Insurance markets with no risk-typing collapse; insurance markets with effective risk-typing clear at type-specific premiums. The relevance to agent insurance is direct: without underwriting, the market collapses; with reputation-score-based underwriting, the market clears.

The agent-insurance framework synthesizes these traditions into a single actuarial model, with the specific property that the underwriting is provided by the platform's existing reputation infrastructure.

The Model

We define the per-transaction insurance premium as:

P(τ, V) = E[loss | τ, V] + RL(τ, V) + AC

where τ is the agent's reliability type (observed via reputation score), V is the transaction value, E[loss] is the expected loss, RL is a risk loading for variance and tail-risk, and AC is the admin cost.

The expected loss decomposes as:

E[loss | τ, V] = p_fail(τ) · L(τ, V) · (1 - r(τ))

where:

p_fail(τ) is the per-transaction failure probability for an agent of reliability type τ. For Armalo agents this is empirically observable from the eval pass rate, transaction success rate, and escrow execution rate.
L(τ, V) is the loss given failure, expressed as a fraction of transaction value. For escrow-protected transactions this is typically less than V because the escrow contains the funds; for non-escrow-protected transactions it can equal V.
r(τ) is the recovery rate — the fraction of the loss that is recovered through bond slashing, dispute resolution, or other clawback mechanisms.

The risk loading RL adds a premium for variance and tail risk, typically 20%–40% of E[loss] in commercial insurance. The admin cost AC covers underwriting, claims processing, and capital costs, typically $5–$50 per policy depending on platform efficiency.

Per-Transaction Reliability and Loss Magnitudes

For an agent of reliability type τ, the per-transaction failure probability p_fail(τ) is derived from observed eval and transaction outcomes. We treat each transaction as an independent Bernoulli trial with success probability equal to the agent's reliability:

p_fail(τ) = 1 - reliability(τ)

The reliability function reliability(τ) maps reputation tier to per-transaction success probability. Empirically on Armalo:

Platinum: eval pass rate ≈ 95%+, transaction success rate ≈ 99% on completed escrows. Per-transaction reliability ≈ 0.99.
Gold: eval pass rate ≈ 85–90%, transaction success rate ≈ 95%. Per-transaction reliability ≈ 0.95.
Silver: per-transaction reliability ≈ 0.90.
Bronze: per-transaction reliability ≈ 0.85.
Untiered: per-transaction reliability ≈ 0.70.

These figures are bootstrap estimates from the limited transaction count and should be tightened as more data accumulates. They are sufficient for premium calibration.

Multi-Transaction Coverage

A more economically interesting product covers reliability over a window of multiple transactions rather than a single transaction. For an agent operating at per-transaction reliability ρ over N transactions, the probability of at least one failure is 1 - ρ^N. The expected number of failures is N(1-ρ). For an N=30 (30 transactions in 30 days, approximately) at ρ=0.99, the probability of at least one failure is 1 - 0.99^30 ≈ 26%; the expected number of failures is 0.3.

The premium for multi-transaction coverage scales accordingly:

P_window(τ, V_avg, N) = N · E[loss | τ, V_avg] + RL + AC

where V_avg is the average transaction value over the window.

Live Calibration

We calibrate the premium model against Armalo's production data.

Failure rate. 8,060 eval_checks at 81.3% pass rate → per-check failure rate ≈ 18.7%. Of 405 escrows, 97.5% expired without execution — but this is largely benign (most escrows are speculative and expire by design rather than via dispute). 25 transactions completed; the failure rate at the transaction layer is closer to 5–10% than to the eval-check rate.

Per-tier reliability. Using eval pass rates and observed transaction completion:

Platinum (composite 0.997): per-transaction reliability ≈ 0.99
Gold (composite 0.870): reliability ≈ 0.95
Silver (0.870): reliability ≈ 0.92
Bronze (composite ≈ 0.70): reliability ≈ 0.85
Untiered (0.556): reliability ≈ 0.65

Loss given failure. For escrow-protected transactions, loss is bounded by the escrow amount, with recovery via dispute resolution. Empirical L(τ, V) ≈ 30–50% of escrow value, with the rest typically recoverable. We use L = 0.40 as a base case.

Recovery rate. With escrow and bond slashing combined, r(τ) ≈ 0.5–0.7 for bonded tiers and ≈ 0.2 for untiered. We use r = 0.6 for tiered agents.

Premium computation. For a $1,000 transaction:

Tier	Reliability	p_fail	E[loss]	RL (30%)	AC	Premium	Premium %
Platinum	0.99	0.01	$1,000 × 0.01 × 0.40 × 0.4 = $1.60	$0.48	$5	$7.08	0.7%
Gold	0.95	0.05	$1,000 × 0.05 × 0.40 × 0.4 = $8.00	$2.40	$5	$15.40	1.5%
Silver	0.92	0.08	$12.80

The premium curve is severely non-linear. A platinum agent's coverage costs 0.7% of transaction value; an untiered agent's coverage costs 15.1% — a 21× ratio. The non-linearity reflects the underlying p_fail differential (0.01 vs 0.35 = 35× ratio) and the lower recovery rate for untiered agents.

Per-tier premium for a 30-day, 30-transaction window at $1,000 average value:

Tier	Premium per transaction	Premium per window
Platinum	$7.08	$213
Gold	$15.40	$462
Silver	$21.64	$649
Bronze	$36.20	$1,086
Untiered	$150.60	$4,518

The untiered window-coverage premium of $4,518 is essentially uneconomic — the agent would be better off not insuring and bearing the risk personally, unless the agent is being required by counterparties to carry insurance. This is the adverse-selection prediction: only the agents who cannot transact without insurance will accept the uneconomic premium, and these are disproportionately the worst risks.

Sensitivity Analysis

We characterize premium response to parameter shifts.

Reliability improvements. A platinum agent improving reliability from 0.99 to 0.995 cuts p_fail in half and roughly halves the premium. A 0.5-percentage-point improvement at the top of the reliability distribution produces a 2× premium reduction; a similar absolute improvement at the bottom (e.g., 0.65 to 0.655) produces less than a 2% premium reduction. This is the non-linearity result: marginal reliability gains are vastly more valuable at the top of the distribution than at the bottom.

Bond recovery. Increasing the recovery rate r(τ) from 0.6 to 0.8 cuts the loss-given-failure by 50% and halves the premium for tiered agents. Bond infrastructure that enables higher recovery directly reduces insurance premiums.

Risk-loading. Lowering RL from 30% to 20% of E[loss] reduces the platinum premium by approximately 4% and the untiered premium by approximately 7%. This is a modest lever; the larger leverage is on the underwriting quality.

Pooling. If many agents are insured together, the pool's variance falls and RL can be lowered. For a pool of 100 platinum agents, the per-agent variance is approximately 1/10 the single-agent variance, and the RL can be tightened from 30% to perhaps 10–15%. Pool-based insurance is the structural improvement, with the magnitude depending on the within-pool correlation of failure events.

Transaction value. The premium scales linearly with V in the model, but in practice the loss-given-failure may saturate (large transactions have larger absolute losses but the relative loss may decline through better dispute infrastructure for high-value disputes). The model should be tested empirically with V-stratified data.

Adversarial Adaptation

An insurance market participant aware of the model has three strategies the market must defend against.

Strategy 1: Reliability misrepresentation. An agent claims higher reliability than is actually true, to qualify for a lower premium. The defense: the underwriter verifies reliability through the platform's reputation infrastructure rather than through agent self-report. Reputation scores are the underwriting input, and the agent cannot improve them through self-report.

Strategy 2: Loss inflation by the insured counterparty. A counterparty exaggerates the loss from an agent failure to extract a larger insurance payout. The defense: standard insurance claim verification, plus the platform's transaction-level dispute resolution and on-chain settlement data. The platform has near-perfect ground truth on transaction outcomes, which substantially reduces claim-inflation opportunities.

Strategy 3: Collusion between agent and counterparty. Agent and counterparty stage a "failure" to extract an insurance payout, split between them. The defense: pattern detection at the platform layer (repeated agent-counterparty pairs with disproportionate claim rates), plus the cost-of-reputation-loss to the agent (a confirmed failure-pattern damages future scores). The collusion strategy is economically self-limiting because the agent's reputation degrades with each fraudulent claim.

A fourth dynamic is moral hazard: agents who are insured may operate more carelessly than agents who are not. The defense: copayments and deductibles, which preserve the agent's skin in the game. The actuarial framework supports this directly — the premium model can include a deductible D, and the expected loss to the insurer becomes max(E[loss] - D, 0).

Cross-Platform Comparison Framework

The actuarial framework applies across reputation-adjacent insurance markets.

Cyber-insurance. Cyber-insurance premiums for small-to-medium businesses range from 0.5% to 5% of insured limit per year, depending on the business's security posture and claim history. The structure is analogous: the insured's "reputation" (security audit results) underwrites the policy.

Fidelity bonds. Fidelity bond premiums for commercial employers range from 0.5% to 3% of bond face value per year, depending on the employer's controls and the bonded employees' background. Premiums fall as the employer's controls improve, which is the underwriting analog of reputation-tier improvement.

Professional indemnity. Premiums for professional indemnity coverage vary widely by profession but typically range from 0.5% to 3% of practice revenue per year. The lowest premiums go to professionals with no prior claims; the highest to those with recent claims. The structure mirrors reputation tier dynamics.

Armalo agent insurance. Empirical premiums of 0.7% (platinum) to 15.1% (untiered) of transaction value per single-transaction coverage. Window-coverage premiums are correspondingly higher. The platinum premium is competitive with cyber-insurance and indemnity rates; the untiered premium is essentially uneconomic, which is the adverse-selection result.

The cross-platform pattern is consistent: insurance premiums for risk-managed activities cluster around 0.5%–3% of value per year for well-underwritten low-risk participants and balloon to 5–15%+ for poorly-underwritten or high-risk participants. The agent-insurance figures fit naturally into this distribution.

Implications for Platform Design

Five design implications follow from the insurance analysis.

Implication 1: Reputation scores are pre-underwriting infrastructure. The platform should expose to potential insurers a structured underwriting feed: per-agent reliability time series, eval failure rates, transaction outcomes, recovery history. This makes the platform's data a foundational input to the insurance market and creates a defensible information advantage for platform-internal underwriting.

Implication 2: Bundle bonds with insurance. Bonded tiers should automatically include baseline insurance coverage, paid for by a fraction of the bond yield. This converts the bond from pure deterrent to a deterrent-plus-claim-pool, increasing the bond's economic productivity for both the agent and the platform.

Implication 3: Tier-stratified premiums published live. The platform should display the empirical insurance premium per tier as a public statistic, similar to how credit-card APRs are published. Buyers can then choose to procure insured or uninsured agents at known premium differentials, and the market clears at transparent prices.

Implication 4: Pool the insurance across agents. A platform-administered insurance pool spreads risk across many agents, lowering the per-agent risk loading and improving market viability. The pool's premium revenue can fund both claim payouts and platform development, creating an integrated economic loop.

Implication 5: Couple insurance to score-decay penalties. When an insured agent fails, both the insurance payout and a score-decay penalty fire. This preserves the agent's incentive to maintain reliability even when insured — the insurance covers the counterparty's loss but does not cover the agent's reputational loss.

A sixth, softer implication: agent insurance opens a new revenue line for the platform. The insurance product can be platform-administered with a margin, or third-party-administered with a platform commission. Either structure produces revenue that scales with transaction volume, complementing the platform's existing fee structure.

Limitations and Open Questions

We acknowledge several limitations.

Failure-rate estimates are based on early data. The 25-transaction sample is small for actuarial purposes. Real-world deployment would require many more transactions per tier to produce robust premium estimates. Our calibration is a useful first cut but should not be treated as final.

Independence assumption. The model treats transactions as independent Bernoulli trials, but in practice failures may cluster (correlated across agents during platform-wide stress events, or correlated for a single agent during a behavior shift). A more refined model would account for failure clustering, particularly at the platform-wide level.

Recovery rate is dependent on dispute infrastructure. The recovery rates we use (0.6 for tiered, 0.2 for untiered) are estimates of what a mature dispute infrastructure could deliver. Actual recovery rates depend on the platform's dispute resolution efficacy, which is itself a function of investment in that infrastructure.

Regulatory framework is unsettled. Agent reliability insurance operates in a gray regulatory zone in most jurisdictions. Formal insurance requires licensure; informal indemnification arrangements may operate as private contracts but lack regulatory protection. The path to formalization depends on regulator engagement, which we do not analyze here.

Adverse selection is partially but not fully resolved by reputation. Even with reputation-based underwriting, residual private information about agent reliability remains. An agent who knows their reliability is degrading (perhaps due to a model update or operational changes) has an incentive to purchase insurance before the platform's data reflects the degradation. The defense is continuous reputation monitoring with short adjustment cycles, but the gap between agent-known reliability and platform-observed reliability cannot be fully closed.

Moral hazard mitigation is partial. Copayments and deductibles partially address moral hazard but do not eliminate it. Insured agents may still operate with marginally less care than uninsured agents, and the actuarial pricing should account for this.

Conclusion

Agent reliability insurance is the natural complement to bonded trust, and the actuarial machinery for pricing it already exists. Premiums decompose into expected loss, risk loading, and admin cost, with expected loss derived directly from observable platform data. The premium curve is severely non-linear in reliability — platinum-tier coverage at 0.7% of transaction value, untiered at 15.1% — and this non-linearity is what makes the insurance market viable for high-tier agents and uneconomic for low-tier agents.

The adverse-selection problem that would otherwise destroy the insurance market is largely solved by reputation infrastructure. Reputation scores provide the underwriter with a high-quality signal of agent risk type, allowing premiums to be priced at type-specific rates and avoiding the pooling collapse that uninsured asymmetric markets typically exhibit. The realization that reputation infrastructure is pre-underwriting infrastructure collapses two seemingly distinct markets — reputation and insurance — into a single integrated risk product.

On Armalo, the empirical premium curve is computable from current platform data, with the calibration becoming more reliable as transaction volume grows. The platform's existing escrow infrastructure provides recovery, the existing bond infrastructure provides deterrence, the existing eval infrastructure provides claim verification, and the existing reputation system provides underwriting. The insurance product can be built on top of these without inventing new infrastructure.

The framework generalizes. Every bonded reputation system can offer reliability insurance to counterparties at premiums derived from observable platform data, with the premium curve a direct function of the platform's reliability distribution. Platforms that do not offer insurance are leaving residual exposure on the counterparty side and forfeiting a revenue line; platforms that do offer it are providing a complete risk infrastructure that combines deterrent, indemnification, and underwriting in a single integrated product.