Receipt-Pact-Recourse Stress Test: A Lab Method for Agent Economy Trust | Armalo Labs | Armalo AI
Economic ModelsMay 26, 20265 min read
Receipt-Pact-Recourse Stress Test: A Lab Method for Agent Economy Trust
Armalo Labs
Key Finding
The agent economy needs enforceable trust objects, not just smarter agents.
Abstract
A stress test for whether agent actions can be joined to promises, evidence, and recourse when real counterparties rely on them.
agent-economypactsrecoursereceiptseconomic-trust
Abstract
The receipt-pact-recourse stress test asks whether an agent action can survive a dispute. The agent made a promise or accepted a scope. It performed work. A counterparty now needs to know what happened, whether the evidence is fresh, who approved the authority, and what recourse exists if the work is wrong. This paper defines the public shape of that test.
Method
Each action is evaluated as a chain rather than a single output. The chain begins with the pact or commitment, continues through tool and evidence receipts, and ends with recourse: review, downgrade, rollback, payment hold, bond consequence, or dispute. The stress test intentionally avoids exposing private escrow controls or contract terms. It checks whether the public article names the chain clearly enough for a buyer or builder to understand the primitive.
Chain link
Evidence required
Economic meaning
Pact
stated scope and acceptance criteria
defines what the agent owes
Receipt
action, tool, source, timestamp
makes work inspectable
Score
current trust state and freshness
changes reliance decision
Recourse
Cite this work
Armalo Labs (2026). Receipt-Pact-Recourse Stress Test: A Lab Method for Agent Economy Trust. Armalo Labs Technical Series, Armalo AI. https://www.armalo.ai/labs/research/research-lab-receipt-pact-recourse-stress-test
Armalo Labs Technical Series · ISSN pending
Explore the trust stack behind the research
These papers are built from the same trust questions Armalo is turning into product surfaces: pacts, trust oracles, attestations, and runtime evidence.
The wave passes only if every article connects trust to a consequence. That standard matters because agent-economy content often stops at reputation language. Reputation is not enough. A counterparty needs to know whether a low-confidence receipt narrows authority, whether a breached pact changes payment, whether a stale score triggers recertification, and whether a disputed memory can be quarantined.
Evidence And Falsification
The stress test borrows from two external pressures: agent protocol delegation, where identity and scope can be stripped across handoffs, and software-agent evaluation, where task success can hide missing provenance. The protocol side is visible in work such as [Security Threat Modeling for Emerging AI-Agent Protocols](https://arxiv.org/abs/2602.11327). The task-evaluation side is visible in benchmarks such as [SWE-bench](https://www.swebench.com/), which are useful precisely because they force observable outcomes. Armalo's stress test asks for the missing middle: can the observable outcome be joined to a promise and a remedy?
The reusable stress matrix is:
Pressure
Test question
Passing evidence
Failing evidence
Scope dispute
What did the agent owe?
pact with acceptance criteria
vague task request
Evidence dispute
What happened?
replayable receipt chain
self-reported success
Payment dispute
Should value move?
score and recourse rule
no hold or challenge path
Learning dispute
Will it repeat?
post-incident policy update
lesson trapped in narrative
The claim is falsified if the system can publish a trust object without recourse, or if recourse is described only as customer support rather than a runtime consequence. Serious agent commerce needs disagreement paths before the disagreement happens.
Operating Depth Addendum
The stress test should be run before a workflow touches money, customer commitments, external messaging, or delegated tool authority. The operator should choose one representative action and ask four questions in order: what promise existed, what receipt proves performance, what score or policy changes reliance, and what recourse becomes available if the counterparty disputes the result. Skipping the order is the common failure. Teams jump from evidence to reputation without asking whether reputation changes any real right.
A passing artifact does not need to publish private contracts or escrow code. It does need to show that the public trust object has a path to consequence. If the only available answer is "a human will review it later," the system may still be useful, but it should not claim agent-economy trust maturity.
The practical output of the stress test is a disposition, not a score for its own sake. The workflow is ready, scoped, blocked, or research-only. That disposition should be visible to the next operator before the next agent is allowed to inherit the prior action as trusted context.
Replication
This is a framework paper: its quantitative content is the structure of the five-link chain and the four-pressure stress matrix, not a measured dataset. To replicate, pick one agent action that a real counterparty relies on and walk the four operator questions in order — promise, receipt, reliance change, recourse — recording where the chain breaks. Every numeric claim in this paper is registered in Armalo's research claims registry with an explicit provenance type.
Proof Debt Is the New Technical Debt: A Ledger for Agent Research Claims