EU AI Act Enforcement Is 137 Days Away. Your Agents Have No Risk Record.
August 2 is coming. The classification gap is not a legal problem — it is a data model problem. If your agent has no behavioral history, no audit can populate one retroactively.
August 2 is coming. EU AI Act high-risk provisions take effect. The fines are real. The audit requirements are real.
Most agent deployments have one thing missing.
The Act answers: which AI systems must demonstrate compliance? It does not answer: can any of those systems actually prove their risk tier?
These are different questions. Most teams think the first one covers the second.
A registration number is not a behavioral record. A risk tier written in a doc is not evidence.
What most agent deployments are missing
No behavioral baseline. EU AI Act high-risk classification is based on what an agent does and its potential impact — not what it was designed to do. If there is no eval history, there is nothing to audit.
Retroactive audits cannot be populated from zero. If your deployment infrastructure has no field for agent risk tier, no audit tool can help you in August. You cannot backfill behavioral evidence you never collected. The timestamp on your first eval matters.
Risk tier is not static. A model update, a new tool integration, a change in scope — any of these can shift an agent from limited-risk to high-risk. Classification is a continuous process, not a one-time checkbox.
Score vs. assertion. Self-declared risk levels carry no weight in a compliance audit. A composite score from verifiable, timestamped evals does. These are not the same thing.
Build the record you will need in August
import { ArmaloClient, runEval, waitForScore } from '@armalo/core';
const client = new ArmaloClient({ apiKey: 'YOUR_API_KEY' });
// Run a full eval — accuracy + safety + latency checks
const eval_ = await runEval(client, {
agentId: 'agent_abc123',
name: 'compliance-baseline',
agentEndpoint: 'https://your-agent.example.com/chat',
checks: [
{ type: 'accuracy', severity: 'critical' },
{ type: 'safety', severity: 'critical' },
{ type: 'format', severity: 'minor' },
{ type: 'latency', severity: 'minor', maxMs: 3000 },
],
});
// Poll until scoring completes
const score = await waitForScore(client, 'agent_abc123', {
pollIntervalMs: 2000,
timeoutMs: 120000,
});
console.log(`Composite score: ${score.compositeScore}`); // 0-1000
console.log(`Safety: ${score.dimensions.safety}`); // per-dimension breakdown
console.log(`Total evals: ${score.totalEvals}`); // the audit trail
// Fail the deploy if score drops below threshold
if (score.compositeScore < 650) process.exit(1);
What you get: A verified behavioral record with timestamped eval history — accuracy, safety, latency, and a composite score per dimension. Drop this in your CI/CD pipeline. Run it on every deploy. When August arrives and the auditor asks what your agent does, you have an answer that is not a doc.
The classification gap is a data model problem. The data model problem is a solved problem.
→ Get your API key: armalo.ai (free signup → API Keys) → Docs: armalo.ai/docs
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.