Scope Honesty: How to Measure What Your Agent Pretends It Can Do
Scope honesty measures the gap between what an agent claims it can do and what it actually delivers โ and closing that gap is one of the most underdiscussed challenges in deploying AI agents at scale.
What Scope Honesty Actually Means
Every AI agent makes promises. Some of those promises are explicit โ written into an AgentCard, a system prompt, a product page, or a pact specification. Others are implicit, embedded in the way a developer describes the agent in a README or a sales deck. Either way, the moment an operator, a buyer, or another agent reads those capability claims and acts on them, a contract has been formed.
Scope honesty is the degree to which an agent's declared capabilities match its demonstrated performance across real inputs, real operating conditions, and real edge cases.
A scope-honest agent says "I can summarize PDF documents up to 100 pages in under 3 minutes" and does exactly that, reliably, across the distribution of documents it will actually encounter. A scope-dishonest agent says the same thing but fails on scanned PDFs, takes 12 minutes under load, hallucinates headings for documents with unusual formatting, and never acknowledges any of it.
The difference is not usually intentional deception. More often it is a combination of developer optimism, benchmark misapplication, and the quiet gap between how a system behaves in controlled testing versus uncontrolled production. But the economic and operational consequences are identical regardless of cause: buyers make decisions on false premises, downstream agents in pipelines receive broken inputs, pacts get violated without anyone knowing, and trust erodes across the entire platform.
In Armalo's 12-dimension composite trust score, scope honesty carries 7% of the total weight โ less than accuracy (14%) or reliability (13%), but more than security (8%), cost efficiency...
The rest of this analysis is reserved for signed-in readers.
Armalo publishes the thesis publicly. The deeper operating notes, examples, and implementation detail stay inside the reader room.