GDPR Gave You Data Rights. It Did Not Give You a Way to Audit the Agent Handling Your Data.
GDPR, CCPA, HIPAA, the EU AI Act. The regulatory landscape for AI and data has never been more developed. The requirements are increasingly concrete: document your AI systems, assess their risks, demonstrate compliance, provide audit trails. Legal teams have gotten involved. Compliance programs are being built.
The regulations assume you can produce auditable evidence that your agents behaved in a compliant way. For most currently deployed AI agents, the infrastructure to produce that evidence doesn't exist. Logs are not audit evidence. Dashboards are not audit evidence. What regulators actually require, and what operators currently have, are two different things — and the gap is about to get expensive.
What the Regulations Actually Require
The EU AI Act's requirements for high-risk AI systems are worth reading carefully, not summarized loosely.
Technical documentation must include: the general description of the AI system, description of the elements of the system and how it interacts with hardware and software, design specifications including general logic and algorithm assumptions, description of the training methodology, testing procedures, and expected outputs. This is not a PDF with bullet points. This is machine-readable documentation that connects system design to behavioral specifications.
Conformity assessment requires that operators demonstrate the system meets the requirements of the Act through documented testing against specific performance criteria. Not "we ran some tests." Tests against defined, documented specifications, with documented results, from a demonstrable evaluation methodology.
Transparency obligations require the ability to explain system decisions in terms humans can understand — which means the behavioral record needs to be preserved and queryable at the decision level, not just at the system level.
Continuous monitoring requirements mean compliance is not a certification you earn once. It's a state you either maintain or don't, with evidence of maintenance required throughout the system's operational lifetime.
GDPR's Article 22 gives individuals the right not to be subject to purely automated decisions with significant legal or similarly significant effects — with exceptions that require demonstrable safeguards. "Demonstrable" means something: the organization must be able to produce, to a regulator, evidence that the safeguards were in place and functioning.
HIPAA's audit controls requirement specifies hardware, software, and procedural mechanisms that record and examine activity. The activity in question, for an AI agent accessing patient data, includes the agent's reasoning, the data it retrieved, the decisions it influenced, and the behavioral profile it operated under at the time. Logs of "agent accessed record X" are necessary but not sufficient.
The Gap Between Observability and Compliance
The immediate response from most engineering teams: "We have logs. We have observability. We can trace every LLM call." This is usually technically true. It misses the nature of the compliance requirement.
Logs tell you what happened. Compliance requires proof that what happened was correct. A log entry showing that an agent accessed a patient's record at 2:14pm on March 15 is evidence of access. Compliance requires demonstrating that the access was authorized, that the agent's handling of the data conformed to defined behavioral standards, and that the outcome was within the agent's documented capability and purpose. Logs provide the first element. They rarely provide the second and third.
Observability tools are optimized for debugging, not for compliance evidence production. LangSmith, Datadog, and their counterparts are excellent for finding what went wrong in a specific interaction. They're not designed for producing standardized evidence that an AI system met defined behavioral commitments over a period of time — evidence that needs to be auditable by a regulator who wasn't present when you built the observability infrastructure.
Logs are controlled by the operator. This is the deepest problem. When regulators ask for evidence of compliant behavior, a log produced by the same organization operating the AI system is a self-attestation. The organization could have modified the logs. They could have cherry-picked which logs to produce. Even if they didn't, the regulator has no basis for confidence that they didn't. Self-attestation has limited evidentiary weight compared to independent evaluation against published behavioral specifications — the same reason financial audits are conducted by third parties rather than by the company's own finance team.
What "Auditable Agent Behavior" Actually Requires
Specifically, compliance audit for AI agents requires five things that observability doesn't provide:
Machine-readable behavioral specifications tied to operational versions. Compliance requires documenting what the agent is supposed to do, in a form that can be tested against. A behavioral pact with defined conditions, thresholds, and verification methods — tied to a specific agent version, model ID, and system prompt hash — is exactly this documentation. It's testable, versioned, and produces an audit trail of what the agent was committed to doing on any given date.
Independent evaluation with documented methodology. An evaluation run by the agent vendor is not independent audit evidence. An evaluation run by a neutral evaluation layer — with documented test cases, verification methodology, and a result that's cryptographically signed and time-stamped — is evidence that a regulator can examine. The key properties: the methodology was defined before the evaluation ran, the evaluator had no incentive to produce a favorable result, and the result is independently verifiable.
Continuous compliance tracking, not point-in-time snapshots. EU AI Act conformity assessment isn't a one-time certification. It's a continuous obligation. The infrastructure to satisfy this needs to track compliance continuously — not just at annual audit time — and produce a time-series record that demonstrates ongoing conformity. An agent that was compliant in January but drifted out of compliance by March has a continuous compliance record that shows the drift. An agent audited annually has a compliance record that shows two snapshots.
Temporal traceability to versioned behavioral specifications. When a regulatory inquiry arrives about an agent decision made six months ago, the evidentiary question is: what behavioral commitments was this agent operating under on that date, and did its behavior conform to them? This requires timestamped evaluation results tied to versioned behavioral specifications. The combination of "agent was running version X" + "version X had pact Y" + "pact Y was evaluated and passed at date Z" is auditable in a way that "we believe the agent was working correctly" is not.
Immutable audit trails. Mutable log systems are not auditable in the compliance sense — they can be retroactively modified. On-chain records — agent registration, pact versions, evaluation results, score history — are immutable by construction. No one can alter what the record shows the agent was committed to doing on March 15. This immutability is what makes on-chain behavioral records legally useful as evidence, not just technically useful for debugging.
The Timeline Problem
Here's the uncomfortable structural reality: compliance evidence requires a historical record, and you can't produce historical evidence retroactively.
An organization that deploys AI agents in a regulated workflow in Q1 2026 and starts building compliance infrastructure in Q3 2026 has a six-month gap in its compliance record. If a regulatory inquiry lands about an agent decision from that gap period, the organization's position is weak: "We didn't have the infrastructure to produce the evidence at that time." That position is exactly what enforcement is designed to penalize.
The EU AI Act's enforcement mechanism gives regulators the ability to require suspension of high-risk AI systems that cannot demonstrate compliance. An organization that can't produce behavioral audit evidence for a deployed system is in a materially weaker position than one that can — regardless of whether the system was actually behaving correctly during the period in question.
The good news: building this infrastructure is tractable. Machine-readable behavioral pacts, independent continuous evaluation, and on-chain audit trails are engineering problems with engineering solutions. None of them require waiting for regulatory guidance that isn't coming.
The bad news: the organizations that deployed AI agents earliest and most broadly are the ones with the largest gap between their current capabilities and compliance requirements. Fast movers have the most exposure.
The Question
For AI agents currently operating in your regulated workflows — healthcare, financial services, HR, legal, insurance — how would you produce today the evidence that those agents behaved in a compliant way over the past six months?
Not the logs. Not the dashboards. The auditable evidence — tied to versioned behavioral specifications, independently evaluated, with an immutable time-series record of compliance status.
If the answer is "we couldn't," the next question is how long you have before you need to be able to.
Armalo's trust infrastructure is designed for exactly this requirement: machine-readable behavioral pacts, independent continuous evaluation, immutable on-chain audit trails, and exportable compliance evidence. armalo.ai