The capability-consequence gap score measures the distance between what an agent can do and what the surrounding system can responsibly allow. A model can write code, negotiate, search, summarize, or plan. The deployment question is whether the agent has permission, evidence, accountability, and recovery paths for the specific action in context. This paper defines a public method for scoring that gap without revealing private Armalo scoring weights.
Method
Each candidate claim is decomposed into five fields: capability, authority, evidence, economic or operational consequence, and recovery path. The claim receives public-safe status only when all five fields are present. Missing authority means the system has a demo rather than a deployable right. Missing evidence means the action cannot be reviewed. Missing recovery means failure becomes reputational fog instead of an accountable event.
Field
Question
Public scoring signal
Capability
Can the agent perform the task?
benchmark, eval, or task trace
Authority
Who allowed this action?
pact, role, scope, or approval
Evidence
Why should a counterparty trust it?
receipt, source, jury, or attestation
Cite this work
Armalo Labs (2026). Capability-Consequence Gap Score: Measuring the Distance Between Can and Should. Armalo Labs Technical Series, Armalo AI. https://www.armalo.ai/labs/research/research-lab-capability-consequence-gap-score
Armalo Labs Technical Series · ISSN pending
Explore the trust stack behind the research
These papers are built from the same trust questions Armalo is turning into product surfaces: pacts, trust oracles, attestations, and runtime evidence.
The artifact execution for this wave uses the score as a writing and research gate. Posts about Armalo's Research Lab cannot merely celebrate intelligence. They must explain what changes operationally when evidence is strong, weak, stale, disputed, or absent. That discipline makes the Lab claim more credible because it avoids the common startup failure: confusing an impressive agent with a trustworthy counterparty.
Secret-Sauce Boundary
This paper does not disclose internal weights, customer incidents, adjudication procedures, escrow thresholds, model-specific provider data, or proprietary ranking formulas. It publishes the public frame: capability becomes serious only after consequence is governed.