GDPR Gave You Data Rights. It Did Not Give You a Way to Audit the Agent Handling Your Data.
Compliance audits of AI agents fail for a structural reason: they audit what the agent did in the past, not what it will do in the future. An agent that was compliant for six months and then drifted is compliant-according-to-the-audit until the next audit runs. The gap between audits is a window of undocumented behavior. In regulated industries, undocumented behavior is a compliance liability regardless of whether the behavior was actually correct.
This isn't a failure of auditors. It's a failure of the underlying infrastructure. Point-in-time audits were designed for systems that don't change between audits. AI agents are systems that drift constantly — model updates, changing input distributions, evolving user behavior, gradual behavioral shift that no single update introduces but that accumulates over months. The audit model assumes stable systems. Agent deployment produces dynamic systems. Continuous behavioral monitoring is the only mechanism that closes the gap.
What the Regulations Actually Require
The EU AI Act's requirements for high-risk AI systems are worth reading carefully, not summarized loosely.
Technical documentation must include the general description of the AI system, design specifications including general logic and algorithm assumptions, training methodology, testing procedures, and expected outputs. This is not a PDF with bullet points — it's machine-readable documentation connecting system design to behavioral specifications.
Conformity assessment requires operators demonstrate the system meets Act requirements through documented testing against specific performance criteria. Not "we ran some tests." Tests against defined, documented specifications, with documented results, from a demonstrable evaluation methodology.
Continuous monitoring requirements mean compliance is not a certification you earn once. It's a state you either maintain or don't, with evidence of maintenance required throughout the system's operational lifetime.
GDPR's Article 22 gives individuals the right not to be subject to purely automated decisions with significant effects — with exceptions that require demonstrable safeguards. "Demonstrable" means the organization must be able to produce, to a regulator, evidence that the safeguards were in place and functioning.
HIPAA's audit controls requirement specifies mechanisms that record and examine activity. For an AI agent accessing patient data, this includes the agent's reasoning, the data it retrieved, the decisions it influenced, and the behavioral profile it operated under at the time. Logs of "agent accessed record X" are necessary but not sufficient.
The Gap Between Observability and Compliance
The immediate response from most engineering teams: "We have logs. We can trace every LLM call." This is usually technically true. It misses the nature of the compliance requirement.
Logs tell you what happened. Compliance requires proof that what happened was correct. A log entry showing an agent accessed a patient record at 2:14pm on March 15 is evidence of access. Compliance requires demonstrating the access was authorized, the agent's handling conformed to defined behavioral standards, and the outcome was within the agent's documented capability and purpose. Logs provide the first element. They rarely provide the second and third.
Observability tools are optimized for debugging, not compliance evidence production. LangSmith, Datadog, and their counterparts are excellent for finding what went wrong in a specific interaction. They're not designed for producing standardized evidence that an AI system met defined behavioral commitments over a period of time — evidence auditable by a regulator who wasn't present when you built the observability infrastructure.
Logs are controlled by the operator. This is the deepest problem. When regulators ask for evidence of compliant behavior, a log produced by the same organization operating the AI system is a self-attestation. The organization could have modified the logs. They could have cherry-picked which to produce. The regulator has no basis for confidence that they didn't. Self-attestation has limited evidentiary weight compared to independent evaluation against published behavioral specifications — the same reason financial audits are conducted by third parties rather than the company's own finance team.
What Auditable Agent Behavior Actually Requires
Five things observability doesn't provide:
Machine-readable behavioral specifications tied to operational versions. A behavioral pact with defined conditions, thresholds, and verification methods — tied to a specific agent version, model ID, and system prompt hash — is testable, versioned, and produces an audit trail of what the agent was committed to doing on any given date.
Independent evaluation with documented methodology. An evaluation run by the agent vendor is not independent audit evidence. An evaluation run by a neutral evaluation layer — with documented test cases, verification methodology, and a result that's cryptographically signed and time-stamped — is evidence a regulator can examine. Key properties: methodology was defined before the evaluation ran, the evaluator had no incentive to produce a favorable result, and the result is independently verifiable.
Continuous compliance tracking, not point-in-time snapshots. The infrastructure to satisfy the EU AI Act's continuous monitoring requirements needs to track compliance continuously — not just at annual audit time — and produce a time-series record demonstrating ongoing conformity. An agent audited annually has two data points. An agent monitored continuously has a behavioral record.
Temporal traceability to versioned behavioral specifications. When a regulatory inquiry arrives about an agent decision made six months ago, the evidentiary question is: what behavioral commitments was this agent operating under on that date, and did its behavior conform to them? "Agent was running version X" + "version X had pact Y" + "pact Y passed evaluation at date Z" is auditable. "We believe the agent was working correctly" is not.
Immutable audit trails. Mutable log systems aren't auditable in the compliance sense — they can be retroactively modified. On-chain records — agent registration, pact versions, evaluation results, score history — are immutable by construction. No one can alter what the record shows the agent was committed to doing on March 15. This immutability is what makes on-chain behavioral records legally useful as evidence, not just technically useful for debugging.
The Timeline Problem
Compliance evidence requires a historical record. You cannot produce historical evidence retroactively.
An organization that deploys AI agents in a regulated workflow in Q1 2026 and starts building compliance infrastructure in Q3 2026 has a six-month gap in its compliance record. If a regulatory inquiry lands about an agent decision from that gap period, the organization's position is weak: "We didn't have the infrastructure to produce the evidence at that time." That is exactly what enforcement is designed to penalize.
The EU AI Act gives regulators the ability to require suspension of high-risk AI systems that cannot demonstrate compliance. An organization that can't produce behavioral audit evidence for a deployed system is in a materially weaker position than one that can — regardless of whether the system was actually behaving correctly.
For AI agents currently operating in regulated workflows — healthcare, financial services, HR, legal — the concrete action is this: define behavioral pacts for each agent now, run continuous independent evaluation against those pacts now, and ensure the evaluation results are stored in an immutable, time-stamped record. The historical record that starts accumulating today is the compliance evidence you'll need later.
Armalo's trust infrastructure produces the compliance evidence regulators require: machine-readable behavioral pacts, independent continuous evaluation, immutable on-chain audit trails, and exportable compliance records tied to versioned specifications. armalo.ai