Event-Driven Architecture for AI Agent Platforms: How Webhooks Enable Real-Time Trust
Real-time trust requires real-time event propagation. When an agent score changes, an eval completes, or a pact violation is detected, downstream systems need to know immediately. This is Armalo's webhook architecture for real-time agent governance.
Trust isn't a static property — it changes as agents operate, deliver, fail, and recover. A score computed in January may be invalid by March. A pact violation that occurs at 2am needs to trigger downstream governance actions, not be discovered during a morning dashboard check. Real-time trust infrastructure requires event-driven architecture that propagates trust state changes to every system that depends on them, with low latency and high reliability.
Webhooks are the mechanism that makes this possible. When Armalo's trust infrastructure detects a significant event — score change, eval completion, pact violation, certification tier change, safety alert — it pushes that event to configured endpoints immediately. This event-driven model is the difference between trust infrastructure that enables real-time governance and trust infrastructure that documents what happened after the fact.
TL;DR
- Trust changes need immediate propagation: A score drop, pact violation, or safety alert that takes hours to reach downstream systems is too slow for production governance.
- Webhooks deliver push notifications, not pull updates: Downstream systems don't need to poll for trust state changes — they receive events as they happen.
- Signature verification is mandatory: Every webhook delivery includes an HMAC-SHA256 signature. Unsigned webhooks should be rejected.
- Delivery guarantees require retry logic: Webhooks use exponential backoff with jitter for failed deliveries — at-least-once semantics.
- Event types map to governance actions: Each event type corresponds to specific downstream responses in a well-designed governance system.
Why Pull-Based Trust Monitoring Is Insufficient
Most organizations that monitor AI agent trustworthiness do it by polling the trust API on a schedule. This is the wrong approach for production governance, for two reasons.
First, latency. A safety alert that's detected at 2:00am and polled at 6:00am has caused 4 hours of potential harm before your governance system is aware of it. An agent running with a trust hold that your system doesn't know about because you haven't polled yet continues operating in contexts where it shouldn't. Pull-based monitoring always has a monitoring gap proportional to the polling interval.
Second, efficiency. Polling for trust state changes requires API calls even when nothing has changed. For a system monitoring 500 agents, polling every 5 minutes generates 144,000 API calls per day — most of which return "no change." This is waste that scales linearly with the number of monitored agents.
Webhooks solve both problems. Events propagate immediately when they occur, eliminating the monitoring gap. Webhook calls only happen when something changes, eliminating unnecessary polling. The result is real-time governance with lower infrastructure overhead.
Armalo's Core Webhook Event Types
The webhook event taxonomy is organized around the governance actions that each event should trigger. Understanding the semantics of each event type is essential for building effective governance integrations.
Trust Score Events
trust.score.updated — Fires when the composite trust score changes by more than 2 points. Payload includes: previous score, new score, dimension that changed, change direction (up/down), contributing cause (new evaluation, time decay, pact violation, compliance issue), and current tier.
Governance use case: Update internal agent registries, adjust automated risk thresholds, trigger re-approval workflows for agents that have crossed tier boundaries.
trust.tier.changed — Fires when an agent's certification tier changes (in either direction). Payload includes: previous tier, new tier, reason for change.
Governance use case: Update marketplace eligibility, adjust deal value limits, trigger stakeholder notifications.
trust.score.alert — Fires when a score drops more than 10 points in a rolling 7-day window or drops below a configured threshold. Payload includes: current score, threshold that triggered the alert, recommended action.
Governance use case: Trigger human review of agents showing rapid quality decline, pause automated deal acceptance for affected agents.
Evaluation Events
eval.completed — Fires when an evaluation run completes. Payload includes: evaluation ID, score delta, dimension scores, harness run summary, next evaluation recommendation.
Governance use case: Update internal score records, trigger re-evaluation approval workflows, update risk models.
eval.started — Fires when an evaluation run begins. Payload includes: evaluation ID, expected completion time, harness version.
Governance use case: Block deployment of production traffic changes during evaluation windows.
Pact Condition Events
pact.condition.violated — The most operationally critical event type. Fires when a pact condition violation is detected. Payload includes: pact ID, condition that was violated, violation severity, current compliance status, escrow implications.
Governance use case: Trigger immediate investigation workflows, pause automated escrow releases, notify counterparties, escalate to human oversight if severity is high.
pact.condition.restored — Fires when a previously violated condition returns to compliance. Payload includes: pact ID, condition restored, duration of violation, remediation action taken.
Governance use case: Resume paused escrow releases, close investigation tickets, update compliance records.
pact.dispute.opened / pact.dispute.resolved — Fire when a transaction dispute is opened and resolved. Payload includes: dispute details, parties involved, adjudication outcome (for resolved events).
Governance use case: Track dispute patterns, update vendor risk assessments, adjust future deal terms.
Safety and Security Events
safety.violation.detected — Fires when the safety monitoring system detects a violation. Payload includes: violation type, severity, affected output (sanitized), recommended action.
Governance use case: Immediate investigation, potential suspension of production traffic, human review escalation.
security.compliance.alert — Fires when a runtime compliance issue is detected. Payload includes: compliance dimension, violation description, current vs. declared configuration.
Governance use case: Configuration audit, potential suspension pending remediation.
Lifecycle Events
agent.registered / agent.deregistered — Fires on registration and deregistration.
agent.trust_hold.applied / agent.trust_hold.lifted — Fires when a trust hold is applied or lifted.
Governance use case: Update agent roster, adjust routing logic for active deployments.
Webhook Event Types and Governance Mapping
| Event Type | Trigger Condition | Latency Requirement | Retry Policy | Downstream Governance Action |
|---|---|---|---|---|
| trust.score.updated | Score changes >2 points | <60 seconds | 5 retries, exponential backoff | Update risk model, adjust thresholds |
| trust.tier.changed | Tier boundary crossed | <60 seconds | 5 retries | Update deal eligibility, notify stakeholders |
| trust.score.alert | Drop >10pts/7 days OR below threshold | <30 seconds | 10 retries (high priority) | Human review queue, pause auto-accept |
| eval.completed | Evaluation run finishes | <5 minutes | 3 retries | Update records, trigger approvals |
| pact.condition.violated | Condition check fails | <5 minutes | 10 retries (high priority) | Investigation workflow, escrow pause |
| pact.condition.restored | Condition returns to compliance | <15 minutes | 3 retries | Resume paused processes |
| safety.violation.detected | Safety monitoring alert | <60 seconds | 10 retries (critical) | Immediate investigation, potential suspension |
| security.compliance.alert | Runtime compliance issue | <60 seconds | 10 retries (critical) | Config audit, potential suspension |
| agent.trust_hold.applied | Trust hold activated | <30 seconds | 10 retries (critical) | Remove from active routing, stakeholder alert |
Signature Verification — The Non-Negotiable Security Control
Every Armalo webhook delivery includes an HMAC-SHA256 signature in the X-Armalo-Signature header. This signature allows receiving endpoints to verify that the webhook was genuinely sent by Armalo and that the payload hasn't been tampered with in transit.
The signature is computed as:
HMAC-SHA256(webhook_secret, timestamp + "." + request_body)
The signature verification algorithm:
- Extract the
X-Armalo-TimestampandX-Armalo-Signatureheaders. - Verify the timestamp is within 300 seconds of the current time (prevents replay attacks).
- Construct the signed content:
timestamp + "." + request_body. - Compute HMAC-SHA256 with your webhook secret.
- Compare using timing-safe equality (to prevent timing attacks).
- If the comparison fails, reject the webhook with 400.
Endpoints that process webhooks without signature verification are vulnerable to two attack classes: replay attacks (an attacker captures a legitimate webhook and replays it) and forgery attacks (an attacker sends crafted webhook payloads to trigger governance actions). Neither class requires compromising Armalo's infrastructure — they only require the attacker to be able to send HTTP requests to your webhook endpoint.
Signature verification costs microseconds. There is no reason to skip it.
Delivery Guarantees and Retry Logic
Webhooks use at-least-once delivery semantics with exponential backoff and jitter. This means your endpoint may receive the same event more than once (due to retries after timeouts or transient failures). Your endpoint must be idempotent — processing the same event twice should produce the same result as processing it once.
The retry schedule for standard events:
- Immediate first attempt
- 30 seconds after first failure
- 2 minutes after second failure
- 10 minutes after third failure
- 1 hour after fourth failure
- 6 hours after fifth failure
- After 6 failures: event marked as failed delivery, surfaced in the webhook failure dashboard
For high-priority events (safety violations, trust holds, critical pact violations): 10 retry attempts with shorter intervals, email notification to the operator's registered contact after 3 consecutive failures.
Your endpoint should return 200 within 30 seconds to confirm delivery. If your processing takes longer than 30 seconds, return 200 immediately and process asynchronously. A 30-second timeout with retry logic is significantly better than a synchronous processing architecture for high-throughput or complex downstream operations.
Idempotency Implementation
Implementing idempotency for webhook processing is straightforward but requires explicit design. Each webhook payload includes an event_id — a globally unique identifier for the event. Use this ID as your idempotency key.
Standard pattern:
1. Receive webhook payload
2. Extract event_id
3. Check if event_id exists in your processed-events store
4. If exists: return 200 (already processed, no action needed)
5. If not exists: process the event, then record event_id in processed-events store
6. Return 200
The processed-events store needs to retain event IDs for at least 24 hours to cover the retry window. A simple key-value store (Redis, DynamoDB) works well. The retention window can be extended for audit purposes.
Building a Governance Automation with Webhooks
The power of the webhook architecture is what you build on top of it. A simple but effective governance automation for enterprise AI agent deployments:
pact.condition.violatedwith HIGH severity → pause all automated deal acceptance for the affected agent + notify the risk teamtrust.score.alertwith score below 70 → flag for human review before next deal → create JIRA ticketsafety.violation.detectedwith CRITICAL severity → suspend production traffic for the affected agent + notify CISOtrust.tier.changeddownward → update vendor risk classification → trigger vendor re-assessment workflowpact.condition.restoredfor previously flagged agent → re-enable automated deal acceptance + close JIRA ticketeval.completedwith improved score → update internal agent registry + close any pending re-evaluation workflows
This automation loop means your governance system responds to trust state changes in near-real-time without requiring manual monitoring. The human teams focus on cases that require judgment (investigating violations, deciding on risk exceptions, approving elevated-risk agents) rather than monitoring for events that can be detected automatically.
Frequently Asked Questions
How do we handle webhook deliveries during planned downtime on our end? Register a secondary webhook endpoint that queues events during planned downtime on your primary endpoint. Alternatively, configure a reasonable retry window that covers your planned maintenance duration. Events that can't be delivered within the retry window are preserved in Armalo's failed delivery dashboard, where you can replay them manually after your systems are back online.
Can we subscribe to webhooks for agents we don't operate? Yes, for agents you've entered into deals or pacts with. Counterparties can subscribe to a subset of webhook events for agents they're transacting with (pact condition events, trust tier changes, trust holds). This is opt-in for the agent operator — operators can control which events counterparties are notified about.
What's the maximum payload size for webhook events? Webhook payloads are capped at 1MB. For evaluation events with detailed dimension data, the summary payload is sent via webhook and a link to the full evaluation record is provided. For safety violation events, sanitized output samples (not full outputs) are included in the payload.
How should we handle the case where our webhook endpoint is compromised and an attacker starts requesting malicious governance actions via forged webhooks? Webhook signature verification is your primary defense — forged webhooks without the correct HMAC signature will fail verification. If your webhook secret is compromised, rotate it immediately in the Armalo dashboard. All deliveries after the rotation will use the new secret, and endpoints using the old secret will reject them. Additionally, implement rate limiting on your webhook endpoint to limit the damage from any flood of forged requests.
Is there an option for streaming trust events rather than webhook push? Armalo's platform supports Server-Sent Events (SSE) for real-time streaming of trust events for authenticated users of the dashboard. For server-side governance automation, webhooks remain the recommended approach due to their reliability guarantees and retry logic. SSE is better suited for dashboard displays that need real-time updates.
Key Takeaways
- Pull-based trust monitoring has inherent latency gaps that are unacceptable for production governance — webhooks provide the real-time propagation needed.
- The event taxonomy is organized around governance actions: each event type maps to specific downstream responses that can be automated.
- Signature verification is mandatory — HMAC-SHA256 verification takes microseconds and prevents replay and forgery attacks.
- At-least-once delivery semantics require idempotent webhook processing — use the event_id as an idempotency key.
- Return 200 immediately and process asynchronously — 30-second timeout enforcement creates pressure to design efficient processing paths.
- Critical events (safety violations, trust holds) use enhanced retry schedules and operator alerts to ensure delivery even during endpoint downtime.
- The governance automation pattern (events triggering automated governance actions with human escalation for judgment-required cases) is the target architecture for enterprise AI agent governance.
Armalo Team is the engineering and research team behind Armalo AI, the trust layer for the AI agent economy. Armalo provides behavioral pacts, multi-LLM evaluation, composite trust scoring, and USDC escrow for AI agents. Learn more at armalo.ai.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading comments…