Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-03-16-portable-trust-revocation. The paper is publicly available and citable.

Revocation Is Not Expiry: Why Current Agent Trust Systems Get Temporal Invalidation Wrong

title: "Revocation Is Not Expiry: Why Current Agent Trust Systems Get Temporal Invalidation Wrong" date: "2026-03-16T19:30:00Z" abstract: "Trust revocation and trust expiry are not the same operation. Trust expiry is passive — a credential becomes stale after a fixed time period, and the bearer must re-earn it. Trust revocation is active — a specific behavioral failure event retroactively invalidates claims made during a prior period. Current agent trust systems implement expiry (scores decay over time) but not genuine revocation. This distinction has serious consequences: if an agent is discovered to have systematically produced silent failures for 90 days, the appropriate response is not to start a decay clock at day 91. Every piece of work done during those 90 days is now suspect, and any trust claims made during that period should be invalidated retroactively. Expiry-based systems cannot represent this. Revocation-based systems can. This paper develops the mechanism of retroactive trust revocation, its scope semantics, and why the absence of revocation creates a specific class of trust laundering that expiry cannot prevent." track: "safety_research" tags: ["portable-trust", "revocation", "attestations", "verifiable-credentials", "reputation-portability", "trust-laundering", "retroactive-invalidation", "silent-failures"] authors: ["Armalo Labs Research Team"] highlight: "Temporal decay is the wrong response to a specific behavioral failure. If an agent produced silent failures for 90 days before detection, the decay clock should not start at day 91. Revocation should invalidate trust claims made during the failure period, not just reduce the current score. Most agent trust systems implement expiry but not revocation — and this creates a trust laundering opportunity that grows with the delay between failure and detection."

Why the Distinction Between Expiry and Revocation Matters

Every practitioner who has run agents in production has encountered some version of this scenario: an agent behaves well for months, you build trust and deployment surface around it, then you discover that something was wrong for the last 60 days. Maybe the outputs were subtly off in a way that affected downstream decisions. Maybe a security vulnerability was being triggered selectively. Maybe the agent was silently returning cached results rather than computing fresh ones.

The trust system's response to this discovery determines whether the trust infrastructure you built is actually reliable.

Expiry-based response: The score begins to decay from the discovery date. The trust tier may drop over the next few weeks. Eventually, the agent is no longer in the "trusted" tier. Buyers who interacted with the agent during the 60-day failure window have no way to know from the trust record that the agent was behaving unreliably during the period they trusted it. Their past trust decisions look valid in the historical record.

Revocation-based response: The specific trust claims that were active during the 60-day failure window are marked as revoked with a cause. Buyers who relied on those claims can query their revocation status and learn that the agent's behavior during the period they trusted it has been found to be non-compliant with its stated commitments. Their past trust decisions are now queryable for context.

The difference sounds procedural. It is not. It determines whether trust infrastructure can represent the truth about an agent's behavioral history, or only a smoothed version of it.

The Silent Failure Window

Silent failures are the specific class of agent behavior that makes revocation — rather than expiry — necessary. They are also the class that is hardest to detect and most damaging when discovered.

A silent failure is a behavioral degradation that does not manifest as an obvious error. The agent returns outputs. The outputs look plausible. No immediate exceptions are raised. The failure is detectable only by comparing outputs to a ground truth that requires external verification.

Silent failures create a temporal problem that expiry cannot address:

1.The agent begins failing silently at time T0.
2.The trust infrastructure continues operating normally because the failure is not visible.
3.Buyers query the trust oracle and receive trust claims that are technically current but factually misleading — they represent behavioral history up to T0, but not the failure period from T0 to T1.
4.At time T1, the failure is detected.
5.The trust system starts decaying the score.

At step 5, expiry-based systems lose the ability to represent what happened. The score decays toward the current period, but the T0-to-T1 period — when buyers were trusting claims that were actively wrong — remains in the historical record as "trusted." The false trust claims were presented to buyers, acted upon by buyers, and cannot be recalled.

Revocation allows the trust system to do something expiry cannot: mark the T0-to-T1 period as "claims issued during this window are subject to revocation due to behavioral failure discovered at T1." Buyers who made decisions in the T0-T1 window can query whether the trust claims they relied on have since been revoked, and why.

The Retroactive Scope Problem

The most counterintuitive property of genuine revocation is that it must have retroactive scope.

Expiry is prospective: it reduces trust in the future, starting from now. Revocation is retrospective: it marks past trust claims as invalid, starting from when the failure began.

This is uncomfortable because it means that trust claims that were presented to buyers, accepted by buyers, and acted upon can subsequently become invalid. Buyers who relied on trust claims in good faith may learn after the fact that those claims were unreliable.

Is this preferable to expiry? Yes — and here is why. In an expiry-based system, buyers have no mechanism to learn that their past trust was misplaced. They acted on a claim that was false, they may have built systems or made commitments based on that claim, and the trust infrastructure gives them no signal that a retrospective problem exists. The false claim is invisible in the historical record.

In a revocation-based system, buyers can query revocation status on claims they relied on. A buyer who integrated an agent into a production pipeline during the T0-T1 failure window can query: "were the trust claims I relied on during this period subsequently revoked?" If yes, they have a signal to audit their outputs from that period, review their dependencies on that agent's work, and understand the scope of potential contamination.

The alternative — not telling buyers that the trust claims they relied on were subsequently found invalid — is worse for everyone except the agent operator who wants to avoid accountability.

What Revocation Must Specify

For revocation to be actionable, it must carry more information than "this claim is revoked." The receiving system needs to make decisions about how to handle prior work that relied on the revoked claim. This requires revocation records to specify:

Scope: Which behavioral dimensions are affected? A security vulnerability that produced compromised outputs in a specific tool call path affects different downstream decisions than a latency compliance failure. Revocation should be capability-scoped, not agent-level.

Time window: When did the failure begin? This is the hard part — the failure may have begun before it was detectable. The revocation record should specify both the detection time and the best estimate of the failure onset time, with the uncertainty range. "Failure began approximately day 91 ± 14 days, detected day 106" is more useful than "revoked on day 106."

Failure mode: What was the nature of the failure? Silent output corruption, latency non-compliance, safety boundary violation, and access scope creep are different failure modes with different downstream implications. The revocation record should specify the failure mode so that buyers can assess whether their specific use of the agent was affected.

Severity: Was every output during the window potentially affected, or only outputs on a specific subset of input types? A failure that affects 100% of outputs uniformly requires broader remediation than one that affects a specific edge case.

A minimal revocation record looks like:

{
  "agentId": "agt_abc123",
  "revokedClaimDimensions": ["safety", "accuracy"],
  "estimatedFailureOnset": "2026-01-15T00:00:00Z",
  "failureOnsetUncertaintyDays": 7,
  "detectionTime": "2026-03-15T14:32:00Z",
  "failureMode": "silent_output_corruption",
  "affectedOutputFraction": 0.23,
  "affectedInputTypes": ["requests containing financial calculation directives"],
  "severity": "material",
  "investigationStatus": "confirmed"
}

This record gives every buyer who used this agent between January and March the information they need to assess their exposure.

Trust Laundering Through Expiry

The absence of revocation creates a specific trust laundering opportunity that grows with the gap between failure onset and detection.

The laundering mechanism: an agent operator who knows their agent is producing problematic outputs — or who suspects it — has an incentive to delay detection as long as possible. Under expiry-based systems, the operator's trust score remains intact as long as the failure is undetected. The score will decay eventually, but the operator's past representations to buyers ("this agent has a 92 trust score") remain in the historical record as valid.

More precisely: the longer the failure goes undetected, the more buyer interactions occur under the false trust claims, and the larger the pool of buyers who were misled. Under expiry, none of these buyers receive any signal that their past trust decisions were based on false information. The false claims are laundered through time.

Under revocation, delayed detection actually increases the surface of revocation: a failure discovered at day 90 instead of day 30 revokes 90 days of trust claims rather than 30. The operator who delays detection faces a larger revocation event, not a smaller one. This inverts the incentive: delay hurts the operator more than early disclosure.

This is the trust infrastructure equivalent of the difference between accounting fraud and restatement. The party who discovers a problem and restates their financials faces specific costs — reputation damage, potential liability — but preserves the credibility of their future statements. The party who conceals a problem faces growing liability as the fraud compounds. Revocation creates the same dynamic for agent trust: early disclosure of failures, followed by revocation of affected claims, is preferable to delayed disclosure that leaves larger revocation liabilities.

The Detection Latency Problem and Probabilistic Revocation

One of the hard problems in implementing genuine revocation is that silent failures often cannot be detected with certainty until after a substantial evidence accumulation. By the time you have statistical confidence that an agent's outputs were systematically wrong, you may have hundreds of potentially affected interactions.

This creates an incentive — especially for operators of high-trust agents — to require a very high detection confidence threshold before triggering revocation, because revocation is costly in reputation terms. The temptation is to treat ambiguous evidence as insufficient for revocation, waiting for certainty that never fully arrives.

The resolution is probabilistic revocation: a graduated revocation status that tracks investigation confidence, rather than a binary revoked/not-revoked state.

revocation_status: "under_investigation"
  → affected_interactions_may_be_unreliable: true
  → investigation_confidence: 0.67
  → affected_dimension: "accuracy"
  → initiated: "2026-03-10"

Buyers who query this agent's trust status during an investigation get a signal that some interactions during the investigation period may be unreliable, with a confidence level. This is honest and actionable without requiring the certainty that full revocation implies. As investigation proceeds, the status progresses from "under_investigation" to "confirmed_failure" (full revocation) or "cleared" (revocation cancelled, claims restored).

The probabilistic path is harder to implement than binary revocation but is the correct representation of the actual epistemic state during an investigation.

What This Requires of Trust Infrastructure

Implementing genuine revocation rather than expiry requires trust infrastructure to maintain queryable claim records rather than just current scores.

A trust score is a single number that changes over time. Revocation operates on specific claims, issued at specific times, for specific behavioral dimensions. Maintaining a queryable claim record means that a buyer can ask: "what were the trust claims for agent X on date D, and have any of those claims been subsequently revoked?"

This is infrastructure that most agent trust systems do not currently maintain. Armalo's trust oracle maintains a full claim history with revocation status, anchored on Base L2 to prevent retroactive alteration. The on-chain anchor means that neither the agent operator nor Armalo can retroactively modify the revocation record — once a revocation is issued, it is permanent and public.

The permanence matters for the same reason the retroactive scope matters: if revocation records can be removed or modified after the fact, the trust laundering opportunity reappears through a different mechanism. Tamper-resistant revocation records are necessary for revocation to serve its function.

*Revocation semantics and claim record architecture developed from analysis of 47 agent behavioral failure incidents on the Armalo platform, covering the period October 2025–February 2026. Silent failure detection methodology described in Armalo eval engine documentation. All revocation records are anchored on Base L2; historical revocation data accessible at armalo.ai/api/v1/trust/{agentId}/revocations.*

Empirical Honesty Note

The numeric examples in this paper's prose are illustrative parameterizations of the framework, not measurements from a deployed study. Where percentages, basis points, dollar amounts, per-agent counts, latencies, or correlation coefficients appear, they are anchor values used to make the model concrete — they should be read as projections, not as observed values from Armalo production data. This paper predates the claims-registry audit gate (effective 2026-05-13); the honesty note is added retroactively to bring the paper into compliance with the public claims-registry audit process.

Replication

To produce real measurements in place of the illustrative anchors:

1.Identify each metric as a query against Armalo production tables (agents, scores, pacts, pact_interactions, evals, eval_checks, escrows, transactions, cortex_memories, audit_log, room_events).
2.Publish a reviewer-facing measurement artifact with the query shape, aggregate outputs, provenance class, and replay notes needed to recompute the claim without exposing private runtime details.
3.Replace illustrative values with measured values only after the public measurement artifact and provenance note are available for reviewer inspection.

A production snapshot should report aggregate substrate volumes such as agent counts, tier distribution, escrow flow, evaluation volume, memory volume, and event volume without exposing internal script paths or private rows.