Trust Lab Peer Review Matrix: Positioning Runtime Trust Research Beside Model Research | Armalo Labs | Armalo AI
Safety ResearchMay 26, 20265 min read
Trust Lab Peer Review Matrix: Positioning Runtime Trust Research Beside Model Research
Armalo Labs
Key Finding
Trust research is a separate discipline with separate proof artifacts.
Abstract
A comparison matrix for model labs, open labs, safety labs, and trust labs, with proof artifacts each discipline owes the market.
trust-labmodel-researchfrontier-aipeer-review
Abstract
This paper positions Armalo Research Lab as a trust-infrastructure lab rather than a foundation-model lab. DeepMind, OpenAI, Anthropic, Nous Research, and neighboring organizations push capability, alignment, interpretability, open models, distributed training, and model behavior standards. Armalo's research lane studies the deployment layer where agents make commitments, use tools, collect evidence, earn or lose trust, and create economic consequences.
Method
The matrix compares disciplines by primary question, proof artifact, and public boundary. The goal is not to claim equivalent compute scale. The goal is to define why runtime trust research deserves a peer position in the agent economy. A buyer choosing an agent system needs capability evidence and trust evidence; one cannot substitute for the other.
These papers are built from the same trust questions Armalo is turning into product surfaces: pacts, trust oracles, attestations, and runtime evidence.
This wave uses the matrix as a claim boundary. Armalo can speak with serious labs because it is working on a hard, underdeveloped, and economically important layer. It should not pretend to be DeepMind's compute operation, OpenAI's model behavior organization, Anthropic's interpretability group, or Nous Research's open training network. It should claim the discipline it is actually building: evidence-bearing trust for agents that act after the model response.
Evidence And Falsification
The matrix is grounded in two adjacent pressures. First, model benchmarks such as [SWE-bench](https://www.swebench.com/) show that agent capability can be measured against task outcomes. Second, browser and tool-security research such as [agentic browser same-origin analysis](https://agent-security.cs.washington.edu/agentic_browsers_sop.html) and [formal MCP security work](https://arxiv.org/abs/2604.05969) show that capability evidence does not automatically answer deployment trust questions. A trust lab sits in the gap between those two bodies of work.
The claim would be falsified if Armalo's research artifacts cannot name a proof object that changes an operating decision. If the Lab publishes papers that only restate market beliefs, or if the verifier cannot connect a paper to an experiment and a boundary, the peer-review matrix fails. The useful peer position is earned when every paper names its question, method, evidence artifact, limitation, and next operating decision.
The reusable framework for reviewers is simple: ask whether the paper teaches a mechanism, exposes a failure mode, names a receipt, and gives the operator a promotion or rollback rule. A paper can be short, but it cannot be weightless.
Operating Depth Addendum
A useful reviewer should score each Research Lab artifact on four axes: mechanism depth, proof boundary, operating consequence, and reuse value. Mechanism depth asks whether the paper explains how the primitive works. Proof boundary asks whether the reader can tell what was measured, what was inferred, and what was withheld. Operating consequence asks whether adopting the idea changes permission, review, pricing, recourse, or rollout. Reuse value asks whether another builder could borrow the framework without copying Armalo internals.
That review model gives the Lab a way to reject attractive but thin papers. A paper can have a strong title and still fail if it does not change a buyer or operator decision. The matrix should therefore be used before publication, after major rewrites, and whenever a paper is cited as evidence in sales, fundraising, or product claims.
The final operator rule is that a paper should not be promoted because it makes the Lab look busy. It should be promoted because a skeptical reader can reuse one artifact from it in a real review: a matrix, receipt checklist, threat model, benchmark shape, or boundary rule.
Replication
This is a framework paper: its quantitative content is the structure of the discipline matrix and the four review axes, not a measured dataset. To replicate, score any research artifact — Armalo's or another lab's — on the four axes (mechanism depth, proof boundary, operating consequence, reuse value) and check whether the artifact names a proof object that changes an operating decision. Every numeric claim in this paper is registered in Armalo's research claims registry with an explicit provenance type.
Proof Debt Is the New Technical Debt: A Ledger for Agent Research Claims