Building an Agent That Can Prove It Didn't Cheat
The hardest problem in AI agent accountability is not detecting when an agent cheats β it is building an agent that can prove it did not. Verifiable behavioral records require cryptographic attestation, not just logging.
Continue the reading path
Topic hub
AttestationThis page is routed through Armalo's metadata-defined attestation hub rather than a loose category bucket.
Next Read
From Vibes to Verification: How to Actually Evaluate an AI Agent
Benchmark scores measure task completion on curated inputs. They tell you almost nothing about how an agent will behave when inputs are adversarial, ambiguous, or outside its training distribution. Here is what actual evaluation looks like.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
The Problem With Logs
Every AI agent deployment produces logs. The logs tell you what the agent did β the inputs it received, the outputs it produced, the actions it took. They are the primary evidence record for agent behavior.
But logs have a fundamental problem as evidence: they can be modified after the fact by anyone with write access to the logging infrastructure. A log that says the agent completed a task correctly is only as trustworthy as the integrity of the logging system. If the logging system can be modified, the log cannot be trusted as evidence.
This is not a theoretical concern. In enterprise AI deployments, the same organization that runs the agent also controls the logging infrastructure. When a dispute arises about agent behavior, both parties β the enterprise and the vendor β have access to the logs and potential incentives to interpret them favorably. Without cryptographic guarantees on the integrity of the record, the log is not evidence. It is a narrative.
Building an agent that can prove its behavior requires more than logging. It requires attestation β cryptographically signed records that cannot be modified without detection, produced at the time of action by an entity whose signing key is known, and verifiable by any party with access to the public key.
What Attestation Means in Practice
Attestation, in the security sense, is the process of making a digitally signed claim about system state or behavior. An attestation is a statement plus a proof that the statement was produced by a specific entity at a specific time.
See your own agent measured against this trust model. $10 to start β $5 in platform credits and a $2.50 bond seed go straight into your account.
Score my agent β $10 βFor AI agent behavioral records, attestation works like this:
-
At the time of each consequential action, the agent produces a structured record: the input that prompted the action, the action taken, the authorization basis (the specific clause in the behavioral pact that authorized the action), and a timestamp.
-
The agent signs this record using its private key β a key that is held by the agent's execution environment and has a public key registered with the attestation infrastructure.
-
The signed record is appended to an append-only audit log. Append-only means new records can be added but existing records cannot be modified. Any modification of an existing record would invalidate the signature.
-
Any party who needs to verify the agent's behavior can retrieve the relevant records from the audit log, verify each signature against the agent's registered public key, and confirm that the record was produced by the agent at the claimed time.
The result is a behavioral record that the agent's execution environment cannot falsify after the fact (the record was signed at the time of action) and that the audit log infrastructure cannot modify without detection (the signatures would be invalidated). The record is as trustworthy as the signing key's integrity, which is maintained through standard key management practices.
The Authorization Basis Problem
Most agent logging captures what the agent did. Almost none of it captures why β the authorization basis for each action. This is the critical gap between a log and an attestation.
For a behavioral record to prove that an agent did not cheat β that is, that it acted within its authorized behavioral scope β the record must connect each action to the specific authorization that permitted it. This connection is the authorization basis.
An authorization basis is a reference to the specific clause in the agent's behavioral pact that authorized the action, plus the condition that was satisfied. For example:
Action: Submitted refund request for order #48291, amount $127.50 Authorization basis: Pact clause 3.2 β "The agent may submit refund requests for orders placed within 90 days, up to {{REFUND_THRESHOLD}}" Condition satisfied: Order date 2026-02-17 (within 90 days of action date 2026-05-17), amount $127.50 < {{REFUND_THRESHOLD}} ($500)
With this record, any reviewer can verify: was this action authorized under the pact? The answer is yes β the clause is specified, the conditions for that clause are documented, and the action satisfies those conditions.
Without the authorization basis, the record shows that a refund was submitted but cannot demonstrate that the submission was within scope. The record proves the action happened. It does not prove the action was authorized.
The Confidence Dimension
A complete behavioral attestation for language model-based agents includes one more dimension that is often overlooked: the agent's confidence in the action.
An agent that takes an action with high confidence has made a different kind of decision than an agent that takes the same action with low confidence. The high-confidence action represents a clear determination that the action is authorized and appropriate. The low-confidence action represents a judgment call under uncertainty β the kind of decision that should be flagged for review.
Including confidence in the attestation record makes it possible to identify actions that were within technical scope but represented significant uncertainty β the cases where the agent probably should have escalated but did not. Over time, the distribution of confidence levels in the attestation record is a signal about whether the agent's escalation triggers are calibrated correctly.
An attestation record that shows an agent consistently taking actions at low confidence β making judgment calls that it is not certain about β without escalating is a behavioral signal that the escalation calibration is too narrow. This is exactly the kind of systemic issue that individual action review misses but aggregate analysis of the attestation record surfaces.
Building Proof, Not Just Records
The distinction between a log and a proof is worth stating precisely. A log is a record of what happened. A proof is evidence that supports a specific claim. Building an agent that can prove it did not cheat means building an agent whose behavioral record constitutes evidence for the specific claim: "this agent's actions were within its authorized behavioral scope."
This requires three things that standard logging does not provide:
Tamper evidence. The record must be structured so that any modification is detectable. Cryptographic signing with the agent's key, combined with an append-only log with a Merkle tree structure, provides this property. Any modification of a historical record invalidates the hash chain.
Authorization traceability. Each action in the record must trace to a specific authorization in the behavioral pact. Without this connection, the record proves the action happened but not that the action was authorized.
Independent verifiability. The proof must be verifiable by parties outside the agent's execution environment β counterparties, regulators, auditors β without access to the agent's internal state. This requires the public components of the attestation infrastructure (the public key, the pact specification, the verification methodology) to be accessible to any party who might need to evaluate the record.
What This Changes for Enterprise Deployments
For enterprises deploying AI agents, attestation infrastructure changes the accountability equation in three concrete ways.
Dispute resolution becomes tractable. When an agent's behavior is disputed, the question becomes: "does the attestation record support the claimed authorization?" This is a verifiable question, not a political one. The time-to-resolution on disputes drops from weeks to hours.
Regulatory compliance becomes demonstrable. Regulators do not accept claims of compliance β they accept evidence of it. An attestation record that shows every action was within the authorized behavioral scope, with the authorization basis documented and signed, is the kind of evidence that satisfies regulatory inquiry. An organization that can produce this record is in a fundamentally stronger position than one that can only produce claims.
Vendor accountability becomes structural. An agent vendor that commits to behavioral attestation β and whose agents produce attestation records that can be independently verified β is making a qualitatively different kind of commitment than one that offers contractual SLAs without behavioral proof. The attestation infrastructure is the mechanism that makes the commitment credible rather than rhetorical.
The Infrastructure Investment
Building attestation infrastructure is not trivial. It requires key management infrastructure for the agent's signing key, an append-only audit log with cryptographic integrity guarantees, a structured attestation format that captures action, authorization basis, confidence, and timestamp, and verification tooling that allows third parties to validate records against the public key and pact specification.
This investment is a one-time architectural decision, not an ongoing operational burden. Once the infrastructure is in place, every agent that runs on it produces attestation records automatically. The marginal cost of attestation per action is small. The marginal value β in dispute resolution speed, regulatory compliance, and counterparty trust β is significant and compounding.
The agents that have this infrastructure today are building behavioral records that will differentiate them in a market that increasingly demands proof, not claims. That differentiation is worth the infrastructure investment.
The Trust Score Readiness Checklist
A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.
- 12-dimension scoring readiness β what you need before evals run
- Common reasons agents score under 70 (and how to fix them)
- A reusable pact template you can fork
- Pre-launch audit sheet you can hand to your security team
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦