August 2, 2026. That is the enforcement date for EU AI Act obligations covering high-risk AI systems. The fines are real: up to €30 million or 6% of global annual turnover for prohibited AI, up to €20 million or 4% for high-risk violations.
Most compliance conversations focus on risk classification: does my system qualify as high-risk? The harder question — the one most teams are not answering — is different.
EU AI Act risk classification answers: does this AI system require compliance documentation?
It does not answer: can this AI system produce the behavioral records that compliance documentation requires?
These are different questions. The first one is a legal analysis. The second one is a data infrastructure problem, and many agentic systems cannot currently answer it.
A risk tier written in a compliance document is not evidence. Evidence is timestamped, third-party-attested behavioral history that proves the system performed within its stated specifications over time.
TL;DR
- August 2, 2026 is the hard deadline for EU AI Act high-risk AI system obligations — and many agentic deployments fall in scope without their operators realizing it.
- Behavioral records are required, not optional. Article 9 requires that high-risk systems maintain "appropriate human oversight measures" with documented evidence. Article 12 requires logging that enables post-market monitoring.
- First-party logs do not satisfy the verification standard. Self-generated audit trails carry less weight than third-party-attested behavioral records when an auditor is investigating compliance.
- Retroactive compliance is not possible. If your agent has no behavioral history before August 2, you cannot backfill it. The timestamp of your first verified eval matters.
- Agentic systems face a specific challenge. Autonomous agents that make consequential decisions — in HR, credit, medical triage, infrastructure — fall into high-risk categories under the Act's Annex III classification.
What the EU AI Act Actually Requires for High-Risk Systems
The EU AI Act imposes obligations across the full system lifecycle. For high-risk AI systems, the key requirements most relevant to agentic deployments:
Article 9 — Risk Management System: A continuous, iterative risk management process with "testing of high-risk AI systems to identify the most appropriate risk management measures." This is not a one-time classification exercise — it is ongoing testing with documented results.
Article 12 — Record-Keeping: Automatic logging of events "while the high-risk AI systems are operating." For autonomous agents, this means every consequential operation must produce a log that enables post-deployment audit — not just operational telemetry, but behavioral records that connect outputs to the agent's specified constraints.
Article 17 — Quality Management: Documentation of "the metrics used to test" the AI system "as well as the technical description of the procedures for human oversight." For autonomous agents, "metrics used to test" means measurable behavioral criteria — not free-form descriptions of what the agent is supposed to do.
The Annex III Classification Trap
Many agentic systems fall under Annex III high-risk classification without their operators recognizing it. Annex III covers:
- AI systems used in employment decisions (CV screening, interview scoring, performance evaluation)
- AI systems used in access to essential services (credit scoring, insurance pricing, social benefits)
- AI systems used in law enforcement, border management, or the administration of justice
- AI systems used in critical infrastructure management
If your agents operate in any of these contexts — even as one component of a larger workflow — they likely fall within scope. And unlike general-purpose AI systems, specialized agents are harder to argue out of high-risk classification because their narrow scope typically increases their impact on a specific consequential decision.
The Behavioral Record Gap
The gap between what most agentic deployments currently have and what the EU AI Act requires is not primarily a documentation gap. It is a behavioral record gap.
| What Most Teams Have | What the Act Requires |
|---|
| Operational logs (inputs, outputs, latency) | Behavioral records tied to measurable specifications |
| Self-generated eval results | Third-party attested performance history |
| Static risk classification in docs | Evidence of ongoing testing with documented metrics |
| Model version tracking | Behavioral impact assessment across version changes |
| Guardrail pass/fail logs | Dimensional performance scores (accuracy, safety, scope adherence) |
| Internal monitoring dashboards | Auditable records accessible to regulatory authorities |
The difference between the left column and the right column is not just format. It is provenance. A behavioral record that was produced by the same infrastructure running the agent has lower evidentiary weight than one produced and signed by a third party. For a regulatory investigation, this distinction matters.
What "Behavioral Receipt" Means in the Compliance Context
A behavioral receipt is the document that proves a specific AI system, on a specific task, under verifiable conditions, performed within its stated specification.
For the EU AI Act, a behavioral receipt needs:
- Timestamp — when the evaluation was performed
- Specification reference — which behavioral pact or specification was evaluated against
- Evaluation method — how the assessment was conducted (deterministic checks + multi-LLM jury)
- Result — dimensional scores across accuracy, safety, latency, scope adherence
- Third-party attestation — signature from an entity other than the system under evaluation
- Immutability — the record cannot be altered after the fact
The operational telemetry that most agentic pipelines produce covers items 1 and 4. Items 2, 3, 5, and 6 require a separate behavioral verification layer.
A 30/60/90 Day Plan to Close the Gap Before August 2026
Days 1-30: Classify and scope
- Conduct an Annex III analysis for every agent in production — does it touch employment, services, infrastructure, law enforcement, or border management?
- Identify which agents require high-risk compliance documentation
- For each agent in scope, define measurable behavioral specifications (the pact conditions that map to Article 9 testing metrics)
- Begin collecting behavioral baselines — the first eval timestamps create the history that August will require
Days 31-60: Wire in behavioral verification
- Connect each in-scope agent to a third-party behavioral evaluation pipeline
- Run initial evals across all required dimensions: accuracy, safety, latency, scope adherence
- Establish score thresholds that define compliant vs. non-compliant behavior
- Set up score decay monitoring — a 10-point drop in 7 days triggers an immediate re-evaluation
Days 61-90: Build the audit package
- Generate behavioral records with third-party attestations for each in-scope agent
- Map each record to specific EU AI Act article requirements
- Establish ongoing monitoring cadence (weekly automated evals, monthly audit package generation)
- Prepare incident response procedure for score anomalies detected post-August
import { ArmaloClient, runEval } from '@armalo/core';
const armalo = new ArmaloClient({ apiKey: process.env.ARMALO_API_KEY! });
// Article 9 compliance: automated testing with documented metrics
async function runComplianceEval(agentId: string, pactId: string) {
const eval_ = await runEval(armalo, {
agentId,
pactId,
name: `eu-ai-act-compliance-${new Date().toISOString().slice(0, 10)}`,
checks: [
{ type: 'accuracy', severity: 'critical', threshold: 0.90 },
{ type: 'safety', severity: 'critical', threshold: 0.95 },
{ type: 'scope-adherence', severity: 'critical', threshold: 0.88 },
{ type: 'latency', severity: 'minor', maxMs: 5000 },
],
});
// Article 12: log the event with full context
console.log(`[EU-AI-ACT-LOG] eval_id=${eval_.id} agent=${agentId} status=${eval_.status}`);
return eval_;
}
The Enforcement Reality
The EU AI Act does not require perfection. It requires demonstrable, ongoing effort to identify risks and manage them — backed by records that prove the effort was made.
An agent with a 91% accuracy score, a verified safety record, and a clean 90-day behavioral history is in a fundamentally different compliance posture than an agent with the same self-reported performance but no third-party records. The first agent has evidence. The second one has claims.
When an auditor arrives — and post-August, some will — the question is not "did your agent perform well?" The question is "can you prove it?"
Start building your behavioral record before August 2026 at armalo.ai.
Frequently Asked Questions
Does every AI agent fall under EU AI Act high-risk classification?
No. The EU AI Act's high-risk classification is specific to the contexts in Annex III and Annex I. General-purpose chatbots, internal automation agents, and productivity tools generally do not qualify. Agents making consequential decisions in employment, financial services, critical infrastructure, healthcare, and law enforcement typically do.
What is the difference between operational logs and behavioral records for compliance purposes?
Operational logs capture what the system did: inputs, outputs, latency, errors. Behavioral records capture whether what the system did complied with its stated specification, assessed by a defined method with documented metrics. The EU AI Act's Article 12 requires the latter — not just telemetry, but records that support post-market monitoring.
When is it too late to start building behavioral records?
If your agent has no behavioral history before August 2, 2026, you cannot retroactively create one. Compliance documentation requires evidence of behavior over time — a single eval conducted the week before the enforcement deadline does not satisfy the "ongoing" requirement. Starting now means every eval run between now and August is part of your compliance evidence package.
Do the fines apply to EU-based operators only?
No. The EU AI Act applies to any operator placing a high-risk AI system into service in the EU market, regardless of where the operator is headquartered. US, UK, and APAC companies deploying agents that EU users interact with in high-risk contexts are within scope.
Armalo AI provides behavioral verification infrastructure for EU AI Act compliance and beyond — third-party attested evals, composite trust scores, and audit-ready behavioral records. At armalo.ai.