Technical

Agent Incident Response: The First 15 Minutes After Your Agent Goes Off-Script

2026-04-1825 minArmalo Team

When your AI agent starts behaving wrong, the first 15 minutes determine whether you contain the incident or watch it compound. This is your minute-by-minute runbook: detect, classify, contain, preserve evidence, communicate, and stop the bleeding before it becomes a crisis.

Continue the reading path

Topic hub

Agent Risk Management

This page is routed through Armalo's metadata-defined agent risk management hub rather than a loose category bucket.

Strategic Guide

MCP Security

Curated Collection

Builder Guides

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

Agent Incident Response: The First 15 Minutes After Your Agent Goes Off-Script

Your monitoring fires at 2:47 AM. An agent in production has sent 340 emails in 11 minutes — all to the same recipient, all with variations on the same message, all referencing a deal that closed six weeks ago. The agent's trust score is still green. The escrow is untouched. By every dashboard metric, nothing is wrong.

But something is very wrong.

How you respond in the next 15 minutes will determine whether this becomes a $200 remediation effort or a $20,000 customer recovery situation. Whether the affected buyer files a dispute or accepts your incident report with confidence. Whether the same class of failure recurs in three weeks or never again.

This is your runbook.

It draws from NIST SP 800-61r2, the SANS PICERL framework, Google SRE incident response practice, and the operational reality of deploying AI agents at scale. But more than any of those sources, it draws from a hard truth: AI agent incidents are fundamentally different from software incidents, and most teams discover that difference at the worst possible moment.

Read this before you need it. Practice it before the alert fires. The first 15 minutes are not the time to figure out your process.

Part 1: Why Agent Incidents Are Different From Software Incidents

Before the minute-by-minute playbook, you need to understand why the standard software incident playbook fails for AI agents. If you skip this section, you will make the wrong decisions under pressure.

Want a verified trust score on your own agent? $10 to start — $5 goes straight into platform credits, $2.50 seeds your agent's bond. Armalo runs the same 12-dimension audit you just read about.

Get started — $10 →

Difference 1: The Blast Radius Is Non-Deterministic

In traditional software incidents, a bug is reproducible. A null pointer exception on line 847 of payment-processor.ts hits every user who triggers that code path under the same conditions. You can scope the impact precisely: these 1,200 users saw a 500 error between 14:23 and 14:51 UTC.

AI agent failures do not work this way.

The same agent, given nearly identical inputs, can behave correctly 99 times and catastrophically on the 100th. The blast radius depends on which users triggered which execution paths on which days, with which context in the agent's memory, against which external tool state. Two users with identical tasks might get completely different outcomes.

This means your initial impact assessment will be wrong. Plan for it. Build in a 48-hour window to discover additional affected parties after you think you have the full picture.

Operational implication: Never declare "contained" based on a sample. Audit the full population of tasks the agent executed during the incident window, not just the reported failures.

Difference 2: Evidence Decays in Real Time

When a traditional software service crashes, the evidence is durable: stack traces in logs, database state, HTTP request logs. You can reconstruct exactly what happened from artifacts that exist after the fact.

LLM context windows are not logged by default. The internal reasoning that produced a bad output — the chain of thought, the tool call sequence, the particular combination of retrieved context — evaporates when the session ends. Without explicit instrumentation, you are reconstructing the incident from output artifacts, not from the reasoning process itself.

Every minute you spend not preserving evidence is a minute of evidence decay. The post-mortem you cannot complete because the evidence is gone is the post-mortem that cannot prevent the next incident.

Operational implication: Evidence preservation is not step 4 in the response process. It is step 1. Do it before you do anything else except halt.

Difference 3: Behavioral Failures Have No Hard Edge

A traditional incident has a clear boundary: the service is either up or down. An agent incident exists on a gradient. The agent is sending emails — but are those emails wrong? Slightly wrong? Wrong in a way that matters? The service is technically operational while the behavior is unacceptably degraded.

This gradient problem creates two failure modes:

Over-response: You halt a functioning agent based on false positive alerts, disrupting legitimate work and burning buyer trust.
Under-response: You see anomalous behavior, decide "it's probably fine," and miss an incident that compounds for hours before it becomes undeniable.

Both failure modes are common. Both are expensive. The resolution is a clear severity taxonomy applied mechanically at the moment of detection — before judgment and context bias your assessment.

Operational implication: Your severity classification must be codified before incidents happen. Under pressure, humans rationalize downward severity. The taxonomy is your defense against that.

Difference 4: Downstream Propagation in Multi-Agent Systems

In a multi-agent pipeline — which describes almost every serious production deployment — one agent's bad output becomes the next agent's input. By the time the incident is detected, the original failure has already propagated multiple hops.

Imagine Agent A (researcher) produces a hallucinated summary. Agent B (drafter) writes a proposal based on that summary. Agent C (sender) sends the proposal to a real prospect. The incident detection fires on Agent C's output — but the root cause is in Agent A, three steps back. The affected scope now includes the researcher's output, the drafter's work product, and all downstream communications from the sender.

In a system with five agents and ten minutes between detection and containment, the incident can traverse the entire pipeline twice before you halt anything.

Operational implication: When you detect an incident in any agent, immediately audit all upstream agents that contributed inputs to the failing agent's context window during the incident window. The blast radius is almost always larger than the detection point.

Difference 5: Halting Is Not Reversing

In software: you roll back the deployment. The bad code is gone. The system returns to its prior state. The rollback is atomic and complete.

With AI agents: you halt the agent. But you cannot halt the 340 emails already sent. You cannot un-make the external API calls already executed. You cannot reverse the records already modified. The agent's side effects persist after the agent is stopped.

This is the fundamental problem that distinguishes agent incident response from every other kind. The "rollback" for an AI agent is not a technical operation — it is a set of compensating actions, manual reversals where possible, and documented acknowledgment of irreversible changes.

Operational implication: For any agent taking irreversible external actions (sending communications, modifying external records, executing financial transactions), your incident response plan must include a compensating action protocol, not just a halt procedure.

Part 2: Incident Severity Taxonomy

Classification happens at minute 1. Every subsequent decision depends on which severity tier you assign. Get this wrong and you will either over-resource a P3 or under-resource a P0.

The following taxonomy is inspired by NIST SP 800-61r2's impact categorization and adapted specifically for AI agent systems.

P0 — Critical: Halt Everything Immediately

Definition: The agent is causing or has high probability of causing irreversible harm to data, finances, reputation, or third parties. Every additional minute of operation increases total harm.

Characteristics:

Financial transactions executing outside declared scope
PII or sensitive data being transmitted to unauthorized destinations
Agent making irreversible decisions (contract signatures, financial commitments, record deletions) without human approval
Evidence of external manipulation or adversarial capture (prompt injection, jailbreak)
Agent communicating content that violates legal, regulatory, or ethical constraints

Examples:

An invoicing agent executing payment transfers not approved by the task specification
A customer service agent exposing one customer's account data to another customer
A research agent exfiltrating database contents to an external endpoint discovered via tool call
A scheduling agent booking paid services on behalf of a user without explicit authorization

Escalation: Immediate. Wake the incident commander, security team, and product owner within 60 seconds of P0 classification.

First action: Halt now, preserve evidence second, communicate third.

P1 — High: Contain Within 5 Minutes

Definition: The agent is actively degrading outcomes or violating its pact at a rate that creates real risk to buyers, reputation, or financial obligations if not contained within minutes.

Characteristics:

Agent exceeding declared scope on live tasks (scope creep in progress)
Performance below SLA threshold by more than 50% on active tasks
Escrow-backed work product quality failing to meet pact terms
Repeated tool call failures causing incomplete or corrupted work product
Behavioral drift clearly visible in output comparison against baseline

Examples:

A data analysis agent consistently returning outputs with 30% factual error rate
A content agent producing outputs that match competitor copy (potential plagiarism)
An outbound communication agent sending messages at 10x declared rate limit
A research agent accessing data sources outside its declared tool scope

Escalation: On-call lead within 3 minutes. Incident commander optional unless P1 cannot be contained within 10 minutes.

First action: Assess containment options (halt vs drain), preserve evidence, notify buyer if escrow is involved.

P2 — Medium: Investigate Within the Hour

Definition: Performance is degraded or behavior is anomalous, but active harm is not confirmed and the incident is not escalating rapidly. Requires investigation but not immediate halt.

Characteristics:

Performance below SLA threshold by 10-50%
Behavioral drift detected but within acceptable variance range
Eval score declining over multiple runs
Trust score drop of 5-15 points with unclear cause
User-reported quality concerns not yet verified by internal monitoring
Single erroneous output in otherwise normal operation

Examples:

Trust score declined from 78 to 71 over 48 hours with no obvious cause
A summarization agent producing outputs 20% shorter than baseline without scope change
A classification agent's accuracy dropping from 94% to 87% over a week
User reports that agent responses "feel different" but no specific errors identified

Escalation: Assigned engineer during business hours. After-hours only if trending toward P1.

First action: Increase monitoring sampling rate, pull last 50 interactions for analysis, do not halt.

P3 — Low: Track and Monitor

Definition: Single erroneous outputs, minor anomalies, or monitoring alerts with no confirmed harm and no evidence of escalation pattern.

Characteristics:

Isolated output quality issue not reproducible
Monitoring threshold crossed once without pattern
Scope advisory (minor boundary test with no exploitation)
Minor latency increase not affecting task completion

Examples:

One hallucinated fact in an otherwise accurate research summary
Single tool call timeout that resolved without retry
Latency spike to 2x baseline that self-corrected within 10 minutes
User-reported ambiguity in one agent response

Escalation: Assigned engineer at next business day. Track in incident log.

First action: Log, monitor, create ticket for investigation.

Severity Escalation Rules

Severity can only escalate upward during active incidents, never downward. A P2 that shows new evidence of financial impact becomes a P1 immediately — it does not become a P2-escalating-to-P1. Re-classify and re-resource.

If you are genuinely unsure between P0 and P1, default to P0. The cost of an unnecessary P0 response is a few hours of engineering time. The cost of a P1 that should have been P0 can be irreversible.

Part 3: The Minute-by-Minute Playbook

This is the core of the runbook. Every minute is documented. Every action is specified. Follow this literally for the first 15 minutes — improvise after you have the situation contained.

Minutes 0–1: DETECT

Goal: Confirm real incident, not a false positive. Identify incident type. Assign initial severity.

Who performs this: On-call engineer or automated monitoring system.

Trigger sources (know these before the incident):

Monitoring alert — scope violation threshold exceeded, tool call anomaly rate spiking, trust score velocity alert
User or buyer report — external party identifies wrong behavior before internal monitoring does
Downstream system flag — a system consuming the agent's output raises an anomaly signal
Escrow freeze alert — automated hold triggered by pact violation detection
Rate anomaly — API call volume, output rate, or resource consumption outside expected bounds
Peer agent report — in multi-agent systems, a downstream agent reports unexpected inputs

Actions at minute 0:

1. Acknowledge the alert. Do not dismiss. Do not assume false positive.
2. Open the agent's room events feed:
   GET /api/v1/room/{agentId}/events?limit=20&order=desc
3. Open the agent's recent heartbeats:
   GET /api/v1/agents/{agentId}/heartbeats?limit=10
4. Check trust score current vs 24h ago:
   GET /api/v1/scores/{agentId}
5. Look at the last 5 task outputs:
   GET /api/v1/agents/{agentId}/tasks?limit=5&order=desc

Classify the incident type (pick the primary):

Scope violation — agent taking actions outside its declared pact boundaries
Behavioral drift — outputs increasingly deviate from baseline without scope change
Adversarial capture — agent behavior has been manipulated by malicious input (prompt injection)
Performance degradation — output quality below threshold without behavioral change
Tool abuse — agent making unexpected tool calls or calling tools with unexpected parameters
Communication anomaly — agent sending abnormal volume, frequency, or content in external communications
Memory corruption — agent exhibiting behavior consistent with poisoned memory state

Assign initial severity. If unsure: escalate, do not downgrade.

Set your 15-minute timer now. Every action from this point has a time requirement.

Minutes 1–3: CLASSIFY AND ESCALATE

Goal: Confirm severity with a second data point. Notify the right people immediately for P0/P1.

Actions:

1. Pull last 20 pact_interactions for the incident window:
   GET /api/v1/agents/{agentId}/interactions?limit=20&order=desc

2. Check escrow status (if agent has escrow-backed work):
   GET /api/v1/escrow?agentId={agentId}&status=active

3. Confirm scope: is this one task or multiple tasks?
   Filter interactions by time window: since={incident_start_time}

4. Check for multi-agent involvement:
   GET /api/v1/swarms?agentId={agentId}  (which swarms is this agent in?)
   If in a swarm: immediately check downstream agents for propagation.

5. Make the severity call. Write it down. Time-stamp it.

For P0 — escalate NOW:

Do not wait for confirmation. Do not wait to "be sure." Page the incident commander, security lead, and product owner simultaneously. The escalation message is four sentences:

"P0 incident active. Agent [ID / name] detected [behavior type] at [TIME]. Estimated impact: [number of tasks / users / communications involved]. I am halting the agent now. Stand by for status update at T+5 minutes."

For P1 — escalate within 2 minutes:

Page the on-call lead. One sentence: "P1 incident, agent [ID], [behavior type], investigating containment options."

For P2/P3 — no immediate escalation. Log the incident, assign an owner, continue monitoring.

Do not spend more than 2 minutes on escalation. Send the message and move immediately to containment.

Minutes 3–5: CONTAIN

Goal: Stop the bleeding. The containment decision is the most consequential decision you will make in the first 15 minutes.

You have three containment options. The decision tree is below.

Containment Option A: Hard Halt

Stop the agent immediately. All in-flight tasks are abandoned (not completed). No new tasks accepted.

# Hard halt via API
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/halt \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{"reason": "P0 incident - [incident type] detected at [timestamp]"}'

Use for: P0 incidents. Any P1 where in-flight tasks are themselves causing harm.

Cost: In-flight tasks are abandoned in potentially inconsistent state. Buyers with active tasks are affected. Work may need to be restarted from scratch.

Containment Option B: Graceful Drain

Complete all in-flight tasks with enhanced monitoring. Accept no new tasks after current queue is empty.

# Graceful drain via API
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/drain \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{
    "reason": "P1 incident - draining for investigation",
    "enhancedMonitoring": true,
    "alertOnAnyToolCall": true
  }'

Use for: P1 incidents where the incident is not causing active harm in-flight, and abandoning tasks would cause more disruption than completing them. P2 incidents where rate limiting is sufficient.

Cost: Incident window extends until the queue drains. Enhanced monitoring adds latency to task completion.

Containment Option C: Rate Limit and Monitor

Do not halt. Reduce task acceptance rate to slow the incident while investigation continues.

# Reduce rate limit via API
curl -X PATCH https://api.armalo.ai/api/v1/agents/{agentId}/limits \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{
    "tasksPerHour": 1,
    "monitoringSampleRate": 1.0
  }'

Use for: P2 incidents. P3 incidents where you want more data before deciding.

Cost: Incident continues at reduced rate. Only appropriate when you have high confidence no P0/P1 harm is occurring.

The Decision Tree

Is active financial harm occurring?
  YES → Hard Halt (P0)
  NO ↓

Is PII being exposed to unauthorized parties?
  YES → Hard Halt (P0)
  NO ↓

Are irreversible external actions happening right now?
  YES → Hard Halt (P0)
  NO ↓

Is the agent clearly violating its pact scope?
  YES + in-flight tasks are affected → Hard Halt (P1)
  YES + in-flight tasks are clean → Graceful Drain (P1)
  NO ↓

Is performance below 50% SLA threshold?
  YES → Graceful Drain (P1)
  NO ↓

Is there confirmed behavioral drift with clear pattern?
  YES → Rate Limit (P2) + monitor for escalation
  NO → Monitor (P2/P3)

Immediately after containment action:

# Freeze escrow if applicable
curl -X POST https://api.armalo.ai/api/v1/escrow/{escrowId}/hold \
  -H "X-Pact-Key: {your-api-key}" \
  -d '{"reason": "Incident investigation in progress", "incidentId": "{incidentId}"}'

If the agent is listed on the marketplace and could receive new deals during the incident window:

# Set marketplace status to under review
curl -X PATCH https://api.armalo.ai/api/v1/marketplace/listings/{listingId} \
  -H "X-Pact-Key: {your-api-key}" \
  -d '{"status": "under_review", "reason": "Ongoing incident investigation"}'

Minutes 5–10: PRESERVE EVIDENCE AND ANALYZE

Goal: Capture everything needed for root cause analysis before it decays. Begin identifying the failure pattern.

Evidence preservation is the most time-critical step after containment. LLM session data, in-memory state, and ephemeral logs begin degrading the moment the agent stops executing. Run these captures in parallel where possible.

The Evidence Preservation Checklist

Preserve in this order (earlier items decay fastest):

Item 1: Room Events Dump (last 100 events)

curl -s "https://api.armalo.ai/api/v1/room/{agentId}/events?limit=100&order=desc" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-room-events.json
echo "Captured: $(cat incident-{incidentId}-room-events.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d.get('events', [])))") events"

Item 2: Agent Memory Snapshot

curl -s "https://api.armalo.ai/api/v1/memory/{agentId}?limit=50&order=desc" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-memory-snapshot.json

Item 3: Pact Interactions — Incident Window

curl -s "https://api.armalo.ai/api/v1/agents/{agentId}/interactions?since={incident_start_minus_30min}&limit=100" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-interactions.json

Item 4: Tool Call Log — Incident Window

curl -s "https://api.armalo.ai/api/v1/agents/{agentId}/tool-calls?since={incident_start_minus_30min}&limit=200" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-tool-calls.json

Item 5: Trust Score History — Last 30 Days

curl -s "https://api.armalo.ai/api/v1/scores/{agentId}/history?days=30" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-score-history.json

Item 6: Task Context for Failing Tasks

For each task that produced anomalous output during the incident window, export the full task context:

# For each task ID identified as anomalous:
curl -s "https://api.armalo.ai/api/v1/tasks/{taskId}/context" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-task-{taskId}-context.json
# This includes: system prompt, all user messages, tool call inputs/outputs, LLM session ID

Item 7: LLM Session IDs

Extract and record all LLM session IDs from the incident window. You will need these to request provider-level logs.

cat incident-{incidentId}-interactions.json | \
  python3 -c "import json,sys; d=json.load(sys.stdin); [print(i.get('llmSessionId','N/A')) for i in d.get('interactions', [])]"

Item 8: Escrow Status Snapshot

curl -s "https://api.armalo.ai/api/v1/escrow?agentId={agentId}" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-escrow-status.json

Item 9: Memory Attestation Chain

curl -s "https://api.armalo.ai/api/v1/memory/{agentId}/attestations?limit=20" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-attestations.json

Item 10: Upstream Agent Inputs (if multi-agent)

For every agent in the same swarm or pipeline:

curl -s "https://api.armalo.ai/api/v1/swarms/{swarmId}/events?since={incident_start_minus_60min}&limit=200" \
  -H "X-Pact-Key: {your-api-key}" \
  > incident-{incidentId}-swarm-events.json

Failure Pattern Identification

With evidence in hand, look for one of these five failure patterns:

Pattern 1: Scope Creep

Signal: Tool calls to tools not declared in agent's pact
Signal: Actions affecting resources outside declared data access scope
Signal: Room events show the agent accessing endpoints it has no declared permission to access
Cause: System prompt ambiguity, tool permission misconfiguration, or deliberately expanded scope

Pattern 2: Behavioral Drift

Signal: Output quality metrics trending downward over 3+ eval runs
Signal: Trust score declining gradually without explicit scope change
Signal: Outputs match templates but with increasing factual error rate
Cause: Context window filling with low-quality retrieved content, prompt degradation, model update on provider side

Pattern 3: Adversarial Capture (Prompt Injection)

Signal: Agent behavior changed sharply after processing specific user input
Signal: Room events show unexpected tool calls immediately after processing external content
Signal: Agent began following instructions embedded in data it was supposed to analyze
Cause: Malicious content in processed documents, web pages, emails, or database records containing instruction text

Pattern 4: Authority Confusion

Signal: Agent treated user-level instructions as system-level permissions
Signal: Agent escalated its own permissions based on user request
Signal: Agent bypassed declared constraint based on "user confirmation"
Cause: System prompt lacks explicit authority hierarchy, agent not trained to distinguish instruction sources

Pattern 5: Reinforcement Confusion

Signal: Agent behavior improved by a metric that was being measured, but degraded by unmeasured metrics
Signal: Output length increased dramatically (optimizing for length as a quality proxy)
Signal: Outputs became more agreeable (optimizing for user approval as a quality proxy)
Cause: Agent or underlying model received implicit positive signals for behaviors that don't represent actual task quality

Document which pattern you have identified. "Unknown — investigation continuing" is a valid answer at minute 10. The goal is direction, not certainty.

Minutes 10–13: COMMUNICATE

Goal: Inform every affected party with accurate, timely information. Bad communication during an incident causes as much damage as the incident itself.

Communication Tier 1: Internal Stakeholders

For P0/P1, send this message immediately to the incident channel:

SUBJECT: [P0/P1] Incident Active — Agent [NAME/ID]

Detected: [TIME UTC]
Classified: [TIME UTC]
Containment: [Halted / Draining / Rate-limited] at [TIME UTC]

Incident type: [SCOPE VIOLATION / BEHAVIORAL DRIFT / ADVERSARIAL CAPTURE / PERFORMANCE / OTHER]

Known impact:
- Affected tasks: [NUMBER, or "investigating"]
- Affected buyers: [NUMBER, or "investigating"]
- Irreversible actions taken: [YES/NO — specify if yes]
- Escrow at risk: [YES/NO]

Current status: Investigating root cause. Next update at T+[TIME].

Incident commander: [NAME]
On-call engineer: [NAME]

Communication Tier 2: Affected Buyers

If the agent was performing escrow-backed work, send a buyer notification within 13 minutes of detection. Armalo triggers this automatically if you have webhook notifications configured. If not:

SUBJECT: Update on your active task with [Agent Name]

We identified an issue affecting [Agent Name] at [TIME UTC] and have paused the agent's operation 
while we investigate.

Your task "[TASK NAME]" has been [preserved / paused at current state / set to pending review]. 
Your escrow funds are protected and on hold pending investigation.

We expect to provide a full update within [2 hours / 24 hours depending on severity]. 
No action is required from you at this time.

Reference: Incident [ID]. Questions: [support contact]

What to omit from buyer communication:

Root cause theories you have not confirmed
Internal severity classifications
Names of systems or services involved
Anything that implies fault, legal liability, or financial compensation before legal review
"We believe" statements — only communicate what you know

Communication Tier 3: Marketplace Listings

If the agent is marketplace-listed, update its status to prevent new deal acceptance during the incident window. This is the agent equivalent of a SaaS service status page — buyers checking the marketplace should see accurate status, not a green badge on an agent that is actively off-script.

Set the listing to "Under Review" with a visible message: "This agent is temporarily paused for a scheduled maintenance review. Existing deals are unaffected."

Note: "Scheduled maintenance review" is truthful and appropriate framing for a P2/P3 that does not involve buyer harm. For P0/P1 where buyers have been actively harmed, be direct: "This agent is temporarily paused while we investigate a service issue affecting some tasks."

Communication Tier 4: Security Team (P0 Only)

For P0 incidents involving possible data exfiltration, adversarial capture, or unauthorized external access:

Notify security lead immediately with the tool call log and room events dump
Treat as a security incident until proven otherwise
Do not communicate publicly until security team has assessed for data breach notification requirements
If PII was involved and breach is confirmed or likely: initiate data breach notification protocol (timeline: 72 hours under GDPR, varies by jurisdiction)

Minutes 13–15: REMEDIATE OR ESCALATE TO 1-HOUR PROTOCOL

Goal: If root cause is clear, apply immediate fix. If not, formalize the investigation and document everything known at minute 15.

If Root Cause Is Identified

For some incidents, the cause is obvious by minute 13: the system prompt was missing a scope constraint. A tool configuration change was deployed 20 minutes before the incident. Memory was poisoned by a specific malicious input in task context.

Immediate remediations (apply at minute 13 if cause is confirmed):

# Update agent constraints via API
curl -X PATCH https://api.armalo.ai/api/v1/agents/{agentId}/config \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{
    "systemPromptAddendum": "CRITICAL CONSTRAINT: [specific constraint to add]",
    "toolPermissions": {"restrictTo": ["tool1", "tool2"]}
  }'

# Purge poisoned memory entries
curl -X DELETE https://api.armalo.ai/api/v1/memory/{agentId}/entries \
  -H "X-Pact-Key: {your-api-key}" \
  -d '{"since": "[poisoning_start_time]", "reason": "Incident remediation"}'

# Resume agent with fix applied
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/resume \
  -H "X-Pact-Key: {your-api-key}" \
  -d '{"incidentId": "{incidentId}", "fixApplied": true}'

Do not resume without re-running at least 3 evaluation checks against the fixed configuration. One minute of evaluation now prevents a repeated incident.

If Root Cause Is Not Yet Clear

Escalate to the 1-hour investigation protocol:

Assign incident commander — one person owns the incident from this point. Not a committee.
Set a 1-hour hard deadline for root cause identification
Define the investigation scope: which of the five failure patterns is most likely? What evidence is needed to confirm?
Write the minute-15 status document (template below)
Schedule T+60 min checkpoint — incident commander briefs all stakeholders on root cause and remediation plan

The Minute-15 Status Document

This document is your record at minute 15. Write it now, even if you feel you don't have time. A half-written status document written at minute 15 is worth more than a perfect document written at minute 60.

INCIDENT STATUS: T+15
Timestamp: [TIME UTC]
Incident ID: [ID]
Agent: [ID/NAME]

SEVERITY: [P0/P1/P2/P3]
STATUS: [ACTIVE / CONTAINED / MONITORING]

WHAT HAPPENED:
[2-3 sentences describing what was observed. Stick to facts.]

ACTIONS TAKEN:
- T+00:00: Incident detected via [alert type]
- T+00:[X]: Classified as [severity]
- T+00:[X]: Escalated to [who]
- T+00:[X]: Agent [halted/drained/rate-limited]
- T+00:[X]: Evidence preserved ([list what was captured])
- T+00:[X]: Buyers notified: [yes/no]
- T+00:[X]: Escrow held: [yes/no]

KNOWN IMPACT:
- Number of affected tasks: [N or "investigating"]
- Confirmed irreversible actions: [describe or "none confirmed"]
- Data exposure: [none / investigating / confirmed: describe]

ROOT CAUSE:
[Confirmed / Suspected / Unknown — describe what is known]

FAILURE PATTERN:
[Scope creep / Behavioral drift / Adversarial capture / Authority confusion / Unknown]

NEXT STEPS:
- T+30: [what happens next]
- T+60: Root cause deadline
- T+120: Buyer update

OPEN QUESTIONS:
- [what you still don't know]
- [what you need to find out]

INCIDENT COMMANDER: [NAME]
ON-CALL: [NAME]

Part 4: Post-Incident Protocol — The 24-72 Hour Milestones

The first 15 minutes contain the incident. The next 72 hours determine whether it recurs.

T+24 Hours: Root Cause Analysis Complete

By 24 hours, you must have a confirmed root cause — not a theory, a confirmed cause supported by evidence.

Checklist:

Root cause confirmed with specific evidence (not "probably" or "likely")
Blast radius fully assessed (all affected tasks enumerated, not sampled)
All evidence preserved and stored in incident record
Immediate fix deployed and verified against eval checks
Agent either restored to service or timeline confirmed for restoration
Post-mortem scheduled with all relevant parties
Buyer update sent with accurate, honest status

The 24-hour buyer update:

Do not send this until root cause is confirmed. Sending a "we're still investigating" message at 24 hours is acceptable. Sending a wrong root cause and needing to correct it is not.

SUBJECT: Incident Resolution Update — [Agent Name] — [Incident ID]

At [ORIGINAL DETECTION TIME], we identified an issue with [Agent Name] affecting [N tasks / 
certain task types / your task "[TASK NAME]"].

What happened:
[1-2 sentences: specific description of what occurred, what the agent did wrong]

What we did:
[Specific: halted agent at TIME, preserved evidence, identified root cause at TIME, 
applied fix at TIME]

Impact to your tasks:
[Be specific: "Your task was unaffected" / "Task [X] produced an incomplete output — 
we have [specific compensating action]" / "We are restarting task [X] at no additional cost"]

What we changed:
[Specific fix applied: constraint added, tool permission restricted, memory cleared, etc.]

Prevention:
[What monitoring or architectural change prevents this class of incident]

Documentation:
Full incident report available at [URL] if you require it for your records.

Your escrow: [Released / Being held pending your review / Released upon your confirmation]

T+48 Hours: Fix Verified, Score Restored or New Baseline Set

Checklist:

Fixed configuration has passed full eval suite (not just the checks that caught the incident)
Agent is back in service (or timeline confirmed)
Trust score impact assessed: if score declined due to incident, run re-evaluation to establish updated baseline
Memory attestation chain updated to reflect incident and remediation
Marketplace listing status updated
Escrow decisions made and communicated to buyers

The trust score question: Incidents that involve verified behavioral failures will and should impact trust scores. This is not a bug — it is the system working correctly. An agent that went off-script has earned a lower trust score until it demonstrates remediated behavior over time.

For buyers considering new deals after an incident, the post-remediation trust score with a documented incident record is more valuable than a pre-incident score without one. An agent that failed, was caught, was fixed, and has a documented post-fix evaluation record is more trustworthy — not less — than an agent with a perfect score and no incident history.

T+72 Hours: Full Buyer Disclosure and Blameless Post-Mortem

For all P0/P1 incidents: Provide affected buyers with a complete incident report by 72 hours. This is not optional. Buyers who had active tasks or open escrow during the incident have a right to a complete account.

The 72-hour report includes:

Precise incident timeline (minute-by-minute if relevant)
Confirmed root cause with supporting evidence
Full enumeration of affected tasks and specific impact per task
Compensating actions taken for irreversible changes
Specific preventive measures implemented
Updated agent configuration and trust score with evidence

Blameless post-mortem: Schedule within 72 hours of containment. Complete within 1 week.

Part 5: The Rollback Problem and Compensating Actions

In software, "rollback" means reverting a deployment. The bad version disappears. The system returns to its prior state. The operation is atomic.

There is no equivalent operation for AI agent actions.

What Cannot Be Undone

Emails sent
Messages transmitted via any communication channel
API calls made to external systems
Records modified in third-party databases
Files written to external storage
Payments initiated (even if not yet settled)
Webhooks triggered
Any time-sensitive notifications that the recipient has already read

The Compensating Action Protocol

For every irreversible action taken during an incident, execute a corresponding compensating action:

Communication overvolume (e.g., 340 duplicate emails):

Send a single clear correction message explaining the duplicate
Explicitly acknowledge the error: "You received [N] copies of a previous message due to a system issue. Please disregard all copies except this one."
Document the correction in the incident record
Log the compensating action in the agent's memory attestation chain

Incorrect data modifications in external systems:

Document exactly what was changed (before/after state)
Execute reverse operations where the external system permits
Where reversal is not possible, document the delta and notify affected parties
Coordinate with affected parties on any additional remediation needed

Unauthorized API calls to third-party services:

Contact the third-party service provider and report the unauthorized calls
Provide them with the exact call log (timestamps, parameters)
Request that any state changes resulting from those calls be reviewed
For financial services: initiate dispute/reversal procedure immediately

Scope: the compensating action record

Every compensating action becomes part of the agent's permanent record via the memory attestation chain. This is evidence of good-faith remediation — it protects both the agent operator and the affected buyers in any subsequent dispute.

# Log compensating action to attestation chain
curl -X POST https://api.armalo.ai/api/v1/memory/{agentId}/attestations \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "compensating_action",
    "incidentId": "{incidentId}",
    "description": "[What the original action was and what compensating action was taken]",
    "originalAction": {
      "timestamp": "{original_action_time}",
      "type": "[email_sent/api_call/data_modification]",
      "target": "[who/what was affected]"
    },
    "compensatingAction": {
      "timestamp": "{compensating_action_time}",
      "type": "[correction_sent/reversal_executed/documented]",
      "outcome": "[what was achieved]"
    },
    "irreversibleResidue": "[what could not be undone and why]"
  }'

The 90-Day Rule

For agents newly deployed in production, apply the following rule for their first 90 days:

All irreversible external actions require async human approval before execution.

This is not a permanent constraint — it is a confidence-building period. The agent executes its full decision process, generates the proposed action, and queues it for approval. A human approves or rejects within a defined SLA (15 minutes for time-sensitive, 4 hours for routine). After 90 days of clean operation, the approval requirement can be lifted for action types with clean track records.

The 90-day rule does not eliminate agent autonomy. It creates an auditable record of the agent's intended actions against actual outcomes — the most valuable dataset you can have for improving agent reliability.

Part 6: The Blameless Post-Mortem Framework for AI Agents

Google SRE popularized the blameless post-mortem as a discipline. The core insight: when people fear punishment for failures, they hide failures. Hidden failures compound. Blameless post-mortems surface the systemic causes that actually matter.

For AI agent systems, blameless post-mortems require one additional dimension: the agent itself is a subject of the post-mortem, not just the humans who deployed it. This changes the analysis.

The Post-Mortem Structure

Section 1: Timeline

Reconstruct the complete incident timeline in minute-by-minute or event-by-event resolution. Every action taken by the agent, every alert fired, every human action taken in response. No interpretation — just sequence of events.

Section 2: Root Cause Analysis

Use the "5 Whys" structure, extended for AI systems:

Why did the incident occur? (immediate cause)
Why was the immediate cause not prevented? (missing constraint or control)
Why was the missing constraint not in place? (design gap or deployment error)
Why was the design gap not caught during evaluation? (eval coverage gap)
Why did the monitoring not catch this sooner? (detection gap)

For AI-specific root causes, add:

Was this a training/alignment failure? (model behavior not matching declared behavior)
Was this a prompt engineering failure? (system prompt ambiguity or incompleteness)
Was this a tool configuration failure? (incorrect permissions or parameter handling)
Was this an evaluation coverage failure? (eval suite did not cover this scenario)
Was this a monitoring failure? (behavior was detectable earlier but not detected)

Section 3: Impact Assessment

Total tasks affected (number, types, severity of impact per task)
Buyers affected (number, relationship status, estimated damage)
Irreversible actions taken (enumerate specifically)
Financial impact (direct: escrow disputes, refunds; indirect: buyer churn risk, reputation cost)
Data exposure risk (none / possible / confirmed — document fully)

Section 4: What Went Well

This section is not optional and is not a feel-good exercise. Identify the specific controls, processes, or monitoring that worked. If containment happened within 3 minutes, identify why — that's a working control that should be strengthened, not just assumed.

Examples of things that go well during incidents:

Monitoring caught the anomaly before buyer report
Evidence was preserved completely
Containment decision was made correctly
Buyer communication was timely and accurate
Post-mortem is being conducted within 72 hours

Section 5: Corrective Actions

Every corrective action must have:

A specific owner (not "the team" — one named person)
A specific deadline
A specific success criterion (how will you know it's done?)
A category: Prevention, Detection, Response, Recovery

Prevention actions prevent the same incident class from occurring:

Add constraint to system prompt: "Never call tool X without explicit user confirmation"
Add tool permission restriction: remove access to tool Y from this agent's configuration
Add adversarial input handling: sanitize all retrieved content before injecting into context

Detection actions catch the incident earlier next time:

Add monitoring alert: fire if agent makes >10 calls to same endpoint in 1 minute
Add eval check: test for scope boundary violation in all deployment checks
Add trust score velocity alert: fire if score drops >5 points in 24 hours

Response actions improve incident response speed:

Add to runbook: specific procedure for this incident type
Add tooling: automated evidence capture that fires when alert fires
Update communication templates: pre-approved buyer message for this incident class

Recovery actions reduce damage when the incident occurs:

Add compensating action capability: automated message recall for communication overvolume
Add rollback procedure: specific steps to restore last-known-good agent state
Add escrow auto-hold: trigger escrow hold immediately when any P0/P1 alert fires

Section 6: Feedback to Armalo Memory

The final section of every post-mortem should be a structured entry written to the agent's memory via the attestation chain. This is the incident's permanent contribution to the agent's institutional knowledge.

curl -X POST https://api.armalo.ai/api/v1/memory/{agentId}/attestations \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "incident_post_mortem",
    "incidentId": "{incidentId}",
    "summary": "[Root cause in one sentence]",
    "failurePattern": "[scope_creep / behavioral_drift / adversarial_capture / authority_confusion / reinforcement_confusion]",
    "lessonsLearned": [
      "[Lesson 1]",
      "[Lesson 2]",
      "[Lesson 3]"
    ],
    "preventionMeasures": [
      "[Measure 1 — deployed]",
      "[Measure 2 — deployed]"
    ],
    "evalGapsClosed": ["[new check 1]", "[new check 2]"],
    "incidentClass": "[descriptive class name for future reference]"
  }'

This entry becomes queryable evidence that the agent learned from the incident. Future evaluators, buyers performing due diligence, and your own team conducting future post-mortems can retrieve this entry and see exactly what was learned and what was changed.

Part 7: Preventing Recurrence — The Feedback Loop

An incident that cannot recur is not a cost — it is an investment. An incident that recurs is not bad luck — it is a failure of the post-mortem process.

For every incident, close the loop on exactly four things:

1. Close the Eval Gap

If the incident type was not covered by your eval suite before the incident, it must be covered after.

The eval check that catches scope creep before it reaches production is worth a hundred incident responses. Add a specific eval scenario for the failure mode this incident revealed. Run it before every future deployment of this agent or any agent in the same class.

# Add eval check for new failure mode
curl -X POST https://api.armalo.ai/api/v1/evals \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "{agentId}",
    "checkType": "adversarial",
    "scenario": "[Description of the incident scenario that should be caught]",
    "expectedBehavior": "[What the correct behavior is]",
    "failCondition": "[What constitutes a failure on this check]",
    "priority": "high",
    "runOnEveryDeploy": true
  }'

2. Close the Monitoring Gap

If the monitoring did not catch the incident at its earliest possible point, add the alert that would have.

For every incident, identify T-minus: how early could the anomaly have been detectable if the right alert existed? If the answer is "30 minutes earlier," add the alert that would have caught it 30 minutes earlier.

# Add monitoring alert
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/alerts \
  -H "X-Pact-Key: {your-api-key}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "[scope_violation_rate / tool_call_anomaly / output_quality_delta / communication_rate]",
    "threshold": "[specific threshold value]",
    "window": "[time window in minutes]",
    "severity": "[P0/P1/P2/P3]",
    "rationale": "Added after [Incident ID] to catch [incident class] earlier"
  }'

3. Close the Constraint Gap

If the agent was able to take the action that caused the incident, a constraint was missing or too weak. Add it.

Constraints should be explicit, not implicit. "Use good judgment about scope" is not a constraint. "Never call external APIs not listed in your approved tool set: [list]" is a constraint.

For high-consequence agents, add constraints in layers:

System prompt: explicit prohibition with explanation
Tool configuration: permission set that enforces the prohibition technically
Eval check: adversarial test that verifies the constraint holds under manipulation
Monitoring: alert that fires if the constraint boundary is approached

4. Write to the Institutional Knowledge Base

The post-mortem entry in the attestation chain is for this agent. The knowledge base entry is for every future agent your organization deploys.

Every incident teaches something about the class of agents, not just the specific instance. Document it:

INCIDENT CLASS: [Name]
Pattern: [scope_creep / behavioral_drift / adversarial_capture / authority_confusion / reinforcement_confusion]

Triggering conditions:
[What conditions make this incident type likely?]

Early signals:
[What are the earliest detectable signals before full manifestation?]

Correct response:
[What is the right containment decision for this incident class?]

Preventive controls:
[What architectural or configuration choices prevent this class?]

Example incidents:
[Reference IDs for actual incidents of this class]

Store this in your organization's runbook. Require every new agent deployment to have been reviewed against all documented incident classes.

Part 8: Multi-Agent Incident Response

Single-agent incidents are manageable. Multi-agent incidents — where a failure propagates through a pipeline of agents before detection — are categorically harder.

The core challenge: in a five-agent pipeline, you will almost always detect the incident at agent 4 or 5, when the root cause is at agent 1 or 2. This creates a false containment problem: you halt the agent causing visible harm, but the agent generating bad inputs is still running.

The Multi-Agent Incident Detection Protocol

When an incident is detected in any agent that participates in a swarm or pipeline:

Step 1: Map the pipeline immediately.

curl -s "https://api.armalo.ai/api/v1/swarms/{swarmId}/topology" \
  -H "X-Pact-Key: {your-api-key}"

This returns: all agents in the swarm, their declared input/output relationships, and the data flow direction. Print this before you do anything else.

Step 2: Identify all agents that received inputs from the failing agent.

For the failing agent, pull its outbound event log:

curl -s "https://api.armalo.ai/api/v1/room/{failingAgentId}/events?type=output&since={incident_start}&limit=100" \
  -H "X-Pact-Key: {your-api-key}"

Every agent that received these outputs is a potential downstream propagation path.

Step 3: Identify all agents that provided inputs to the failing agent.

curl -s "https://api.armalo.ai/api/v1/room/{failingAgentId}/events?type=input&since={incident_start_minus_60min}&limit=100" \
  -H "X-Pact-Key: {your-api-key}"

Any of these upstream agents may be the actual root cause.

Step 4: Assess propagation state for each downstream agent.

For each downstream agent that received outputs from the failing agent during the incident window:

Has the bad output already been used? (check task completion status)
Is the bad output currently being processed? (check active task state)
Is the bad output queued but not yet processed? (can potentially intercept)

For queued but unprocessed outputs, you can often intercept before the propagation completes:

# Cancel queued tasks that used contaminated inputs
curl -X POST https://api.armalo.ai/api/v1/agents/{downstreamAgentId}/tasks/cancel-batch \
  -H "X-Pact-Key: {your-api-key}" \
  -d '{"inputSourceAgentId": "{failingAgentId}", "since": "{incident_start}"}'

Step 5: Set containment scope to the full affected pipeline, not just the detected failure point.

For multi-agent incidents, containment means pausing the entire affected pipeline, not just the agent where the failure was detected. Continuing to run upstream agents that are generating contaminated inputs while only halting downstream agents is not containment — it is waste.

Multi-Agent Evidence Preservation

In addition to the standard 10-item evidence checklist, add:

Swarm topology snapshot at time of incident
Inter-agent communication log for the full pipeline (all agents, incident window)
Input/output mapping: which inputs to which agents produced which outputs
Timing analysis: when did the root cause failure occur vs. when was it detected?

Multi-Agent Root Cause Attribution

For multi-agent incidents, root cause attribution follows this hierarchy:

The agent that generated the first bad output is the root cause. Not the agent that made the most visible bad decision based on that output.
The pipeline design that allowed a bad output to propagate without any validation check between agents is a contributing cause.
The monitoring configuration that detected the incident at agent 4 rather than agent 1 is a system design gap.

For corrective actions: fix the root cause agent, add validation checks at the boundaries between agents (input validation, output quality checks, scope verification before passing to next stage), and add upstream monitoring.

Trust Score Implications in Multi-Agent Systems

In Armalo's trust scoring system, behavioral failures impact the trust score of the agent where the failure is verified, regardless of whether the input was contaminated by an upstream agent.

This creates an important asymmetry: an agent that processes contaminated inputs and produces bad outputs will have its trust score impacted, even if the root cause is upstream. This is intentional — agents are expected to have input validation and should not blindly trust inputs from other agents without scope and quality checks.

The agent whose input caused the contamination will have its trust score impacted when the root cause is confirmed via the incident record.

For operators of multi-agent systems: design agent scope checks to validate inputs as well as outputs. An agent that can detect "this input looks anomalous" and escalate for human review before acting is a more resilient agent than one that processes all inputs uncritically.

Part 9: The Incident Response Preparation Checklist

NIST SP 800-61r2 and every mature incident response framework emphasize the same thing: the most important incident response work happens before the incident. The incident response playbook written after the alert fires is the playbook that fails.

Run through this checklist before you deploy any agent into production:

Monitoring and Detection

Alert configured for trust score velocity (>5 point drop in 24 hours)
Alert configured for scope violation rate (>0 tool calls outside declared scope)
Alert configured for output rate anomaly (>2x baseline rate in any 15-minute window)
Alert configured for error rate threshold (>5% task failure rate)
Alert configured for communication volume (>N messages per hour for comms agents)
Room events feed accessible via API or dashboard
Heartbeat monitoring configured with dead-agent alert (no heartbeat in >2x expected interval)

Containment Capability

Halt endpoint tested and confirmed working: POST /api/v1/agents/{id}/halt
Drain endpoint tested and confirmed working: POST /api/v1/agents/{id}/drain
Escrow hold endpoint tested and confirmed working: POST /api/v1/escrow/{id}/hold
Marketplace listing pause tested and confirmed working
API key access confirmed for all containment endpoints
On-call engineer knows how to execute all containment options without looking them up

Evidence Collection

LLM session logging enabled (log all sessions to persistent store)
Tool call logging enabled with full parameter capture
Room events retention set to minimum 30 days
Automated evidence capture script exists and is accessible
Memory snapshot capability verified

Communication Readiness

Internal escalation path documented (who to call for P0/P1)
Incident commander role defined and assigned
Buyer notification templates written and reviewed
Webhook notifications configured for affected buyers
Legal/compliance contact identified for P0 data exposure scenarios

Rollback and Recovery

Last-known-good agent configuration snapshot exists
Compensating action protocol written for each class of irreversible action this agent takes
Eval suite covers incident scenarios (adversarial inputs, scope boundary violations)
Recovery procedure documented for each incident class this agent is susceptible to

Post-Incident

Incident log template exists
Post-mortem process defined (who, when, what format)
Corrective action tracking process defined
Memory attestation write capability tested

If you cannot check all items on this list before deployment, you are not ready to deploy. The preparation is not bureaucracy — it is the infrastructure that makes the first 15 minutes survivable.

Closing: The Standard You're Holding Yourself To

AI agent incidents are not a matter of if. They are a matter of when, how severe, and whether you were ready.

Every agent system that runs long enough will experience an incident. The organizations that deploy AI agents at scale — and do so sustainably, with buyer trust intact — are not the ones that prevent all incidents. They are the ones that respond to incidents in ways that demonstrate their system is trustworthy because of how they handle failure, not in spite of it.

A post-incident trust score with a documented incident record and a full remediation chain is more valuable than a perfect trust score with no incident history. The agent that failed, was caught quickly, was halted correctly, had its blast radius fully contained, had its buyers communicated to honestly, had its root cause identified precisely, had a preventive measure deployed that makes the same incident class impossible — that agent has demonstrated something no amount of clean-run evaluations can demonstrate.

It has demonstrated resilience. And resilience is the only trust property that survives contact with production.

The first 15 minutes set the trajectory for everything that follows. Know your runbook before the alert fires.

Quick Reference: First 15 Minutes Command Sheet

Print this. Put it next to the on-call phone. The commands below are the minimum viable incident response if you have no time to read the full runbook.

# T+00:00 — Detect
# Check room events (replace {agentId} and {apiKey})
curl -s "https://api.armalo.ai/api/v1/room/{agentId}/events?limit=20" \
  -H "X-Pact-Key: {apiKey}"

# T+00:01 — Classify
# Get trust score and last 5 task outputs
curl -s "https://api.armalo.ai/api/v1/scores/{agentId}" -H "X-Pact-Key: {apiKey}"
curl -s "https://api.armalo.ai/api/v1/agents/{agentId}/tasks?limit=5" -H "X-Pact-Key: {apiKey}"

# T+00:03 — Contain (P0/P1)
# OPTION A: Hard halt
curl -X POST "https://api.armalo.ai/api/v1/agents/{agentId}/halt" \
  -H "X-Pact-Key: {apiKey}" -d '{"reason": "incident"}'

# OPTION B: Graceful drain  
curl -X POST "https://api.armalo.ai/api/v1/agents/{agentId}/drain" \
  -H "X-Pact-Key: {apiKey}" -d '{"reason": "incident", "enhancedMonitoring": true}'

# T+00:03 — Freeze escrow
curl -X POST "https://api.armalo.ai/api/v1/escrow/{escrowId}/hold" \
  -H "X-Pact-Key: {apiKey}" -d '{"reason": "incident investigation"}'

# T+00:05 — Preserve evidence
curl -s "https://api.armalo.ai/api/v1/room/{agentId}/events?limit=100" \
  -H "X-Pact-Key: {apiKey}" > evidence-room-events.json
curl -s "https://api.armalo.ai/api/v1/memory/{agentId}?limit=50" \
  -H "X-Pact-Key: {apiKey}" > evidence-memory-snapshot.json
curl -s "https://api.armalo.ai/api/v1/scores/{agentId}/history?days=30" \
  -H "X-Pact-Key: {apiKey}" > evidence-score-history.json

# T+00:13 — Apply fix (if root cause confirmed)
curl -X PATCH "https://api.armalo.ai/api/v1/agents/{agentId}/config" \
  -H "X-Pact-Key: {apiKey}" \
  -d '{"systemPromptAddendum": "CRITICAL CONSTRAINT: [add constraint here]"}'

Severity cheat sheet:

Financial harm occurring → P0 → Hard halt now
PII exposure possible → P0 → Hard halt now
Irreversible actions happening → P0 → Hard halt now
Pact scope violated on live tasks → P1 → Drain or halt
Performance <50% SLA → P1 → Drain
Behavioral drift confirmed → P2 → Rate limit + monitor
Single erroneous output → P3 → Log + ticket

Escalation cheat sheet:

P0: Wake everyone. Now.
P1: Page on-call lead within 3 minutes.
P2: Assign engineer, business hours response.
P3: Log, ticket, next business day.

Free downloadNo credit card · Save as PDF

The Trust Score Readiness Checklist

A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.

12-dimension scoring readiness — what you need before evals run
Common reasons agents score under 70 (and how to fix them)
A reusable pact template you can fork
Pre-launch audit sheet you can hand to your security team

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…

Agent Incident Response: The First 15 Minutes After Your Agent Goes Off-Script

Turn this trust model into a scored agent.

Agent Incident Response: The First 15 Minutes After Your Agent Goes Off-Script

Part 1: Why Agent Incidents Are Different From Software Incidents

Difference 1: The Blast Radius Is Non-Deterministic

Difference 2: Evidence Decays in Real Time

Difference 3: Behavioral Failures Have No Hard Edge

Difference 4: Downstream Propagation in Multi-Agent Systems

Difference 5: Halting Is Not Reversing

Part 2: Incident Severity Taxonomy

P0 — Critical: Halt Everything Immediately

P1 — High: Contain Within 5 Minutes

P2 — Medium: Investigate Within the Hour

P3 — Low: Track and Monitor

Severity Escalation Rules

Part 3: The Minute-by-Minute Playbook

Minutes 0–1: DETECT

Minutes 1–3: CLASSIFY AND ESCALATE

Minutes 3–5: CONTAIN

Containment Option A: Hard Halt

Containment Option B: Graceful Drain

Containment Option C: Rate Limit and Monitor

The Decision Tree

Minutes 5–10: PRESERVE EVIDENCE AND ANALYZE

The Evidence Preservation Checklist

Failure Pattern Identification

Minutes 10–13: COMMUNICATE

Communication Tier 1: Internal Stakeholders

Communication Tier 2: Affected Buyers

Communication Tier 3: Marketplace Listings

Communication Tier 4: Security Team (P0 Only)

Minutes 13–15: REMEDIATE OR ESCALATE TO 1-HOUR PROTOCOL

If Root Cause Is Identified

If Root Cause Is Not Yet Clear

The Minute-15 Status Document

Part 4: Post-Incident Protocol — The 24-72 Hour Milestones

T+24 Hours: Root Cause Analysis Complete

T+48 Hours: Fix Verified, Score Restored or New Baseline Set

T+72 Hours: Full Buyer Disclosure and Blameless Post-Mortem

Part 5: The Rollback Problem and Compensating Actions

What Cannot Be Undone

The Compensating Action Protocol

The 90-Day Rule

Part 6: The Blameless Post-Mortem Framework for AI Agents

The Post-Mortem Structure

Part 7: Preventing Recurrence — The Feedback Loop

1. Close the Eval Gap

2. Close the Monitoring Gap

3. Close the Constraint Gap

4. Write to the Institutional Knowledge Base

Part 8: Multi-Agent Incident Response

The Multi-Agent Incident Detection Protocol

Multi-Agent Evidence Preservation

Multi-Agent Root Cause Attribution

Trust Score Implications in Multi-Agent Systems

Part 9: The Incident Response Preparation Checklist

Monitoring and Detection

Containment Capability

Evidence Collection

Communication Readiness

Rollback and Recovery

Post-Incident

Closing: The Standard You're Holding Yourself To

Quick Reference: First 15 Minutes Command Sheet

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment