Agent Incident Response: The First 15 Minutes After Your Agent Goes Off-Script
When your AI agent starts behaving wrong, the first 15 minutes determine whether you contain the incident or watch it compound. This is your minute-by-minute runbook: detect, classify, contain, preserve evidence, communicate, and stop the bleeding before it becomes a crisis.
Continue the reading path
Topic hub
Agent Risk ManagementThis page is routed through Armalo's metadata-defined agent risk management hub rather than a loose category bucket.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Agent Incident Response: The First 15 Minutes After Your Agent Goes Off-Script
Your monitoring fires at 2:47 AM. An agent in production has sent 340 emails in 11 minutes β all to the same recipient, all with variations on the same message, all referencing a deal that closed six weeks ago. The agent's trust score is still green. The escrow is untouched. By every dashboard metric, nothing is wrong.
But something is very wrong.
How you respond in the next 15 minutes will determine whether this becomes a $200 remediation effort or a $20,000 customer recovery situation. Whether the affected buyer files a dispute or accepts your incident report with confidence. Whether the same class of failure recurs in three weeks or never again.
This is your runbook.
It draws from NIST SP 800-61r2, the SANS PICERL framework, Google SRE incident response practice, and the operational reality of deploying AI agents at scale. But more than any of those sources, it draws from a hard truth: AI agent incidents are fundamentally different from software incidents, and most teams discover that difference at the worst possible moment.
Read this before you need it. Practice it before the alert fires. The first 15 minutes are not the time to figure out your process.
Part 1: Why Agent Incidents Are Different From Software Incidents
Before the minute-by-minute playbook, you need to understand why the standard software incident playbook fails for AI agents. If you skip this section, you will make the wrong decisions under pressure.
Want a verified trust score on your own agent? $10 to start β $5 goes straight into platform credits, $2.50 seeds your agent's bond. Armalo runs the same 12-dimension audit you just read about.
Get started β $10 βDifference 1: The Blast Radius Is Non-Deterministic
In traditional software incidents, a bug is reproducible. A null pointer exception on line 847 of payment-processor.ts hits every user who triggers that code path under the same conditions. You can scope the impact precisely: these 1,200 users saw a 500 error between 14:23 and 14:51 UTC.
AI agent failures do not work this way.
The same agent, given nearly identical inputs, can behave correctly 99 times and catastrophically on the 100th. The blast radius depends on which users triggered which execution paths on which days, with which context in the agent's memory, against which external tool state. Two users with identical tasks might get completely different outcomes.
This means your initial impact assessment will be wrong. Plan for it. Build in a 48-hour window to discover additional affected parties after you think you have the full picture.
Operational implication: Never declare "contained" based on a sample. Audit the full population of tasks the agent executed during the incident window, not just the reported failures.
Difference 2: Evidence Decays in Real Time
When a traditional software service crashes, the evidence is durable: stack traces in logs, database state, HTTP request logs. You can reconstruct exactly what happened from artifacts that exist after the fact.
LLM context windows are not logged by default. The internal reasoning that produced a bad output β the chain of thought, the tool call sequence, the particular combination of retrieved context β evaporates when the session ends. Without explicit instrumentation, you are reconstructing the incident from output artifacts, not from the reasoning process itself.
Every minute you spend not preserving evidence is a minute of evidence decay. The post-mortem you cannot complete because the evidence is gone is the post-mortem that cannot prevent the next incident.
Operational implication: Evidence preservation is not step 4 in the response process. It is step 1. Do it before you do anything else except halt.
Difference 3: Behavioral Failures Have No Hard Edge
A traditional incident has a clear boundary: the service is either up or down. An agent incident exists on a gradient. The agent is sending emails β but are those emails wrong? Slightly wrong? Wrong in a way that matters? The service is technically operational while the behavior is unacceptably degraded.
This gradient problem creates two failure modes:
- Over-response: You halt a functioning agent based on false positive alerts, disrupting legitimate work and burning buyer trust.
- Under-response: You see anomalous behavior, decide "it's probably fine," and miss an incident that compounds for hours before it becomes undeniable.
Both failure modes are common. Both are expensive. The resolution is a clear severity taxonomy applied mechanically at the moment of detection β before judgment and context bias your assessment.
Operational implication: Your severity classification must be codified before incidents happen. Under pressure, humans rationalize downward severity. The taxonomy is your defense against that.
Difference 4: Downstream Propagation in Multi-Agent Systems
In a multi-agent pipeline β which describes almost every serious production deployment β one agent's bad output becomes the next agent's input. By the time the incident is detected, the original failure has already propagated multiple hops.
Imagine Agent A (researcher) produces a hallucinated summary. Agent B (drafter) writes a proposal based on that summary. Agent C (sender) sends the proposal to a real prospect. The incident detection fires on Agent C's output β but the root cause is in Agent A, three steps back. The affected scope now includes the researcher's output, the drafter's work product, and all downstream communications from the sender.
In a system with five agents and ten minutes between detection and containment, the incident can traverse the entire pipeline twice before you halt anything.
Operational implication: When you detect an incident in any agent, immediately audit all upstream agents that contributed inputs to the failing agent's context window during the incident window. The blast radius is almost always larger than the detection point.
Difference 5: Halting Is Not Reversing
In software: you roll back the deployment. The bad code is gone. The system returns to its prior state. The rollback is atomic and complete.
With AI agents: you halt the agent. But you cannot halt the 340 emails already sent. You cannot un-make the external API calls already executed. You cannot reverse the records already modified. The agent's side effects persist after the agent is stopped.
This is the fundamental problem that distinguishes agent incident response from every other kind. The "rollback" for an AI agent is not a technical operation β it is a set of compensating actions, manual reversals where possible, and documented acknowledgment of irreversible changes.
Operational implication: For any agent taking irreversible external actions (sending communications, modifying external records, executing financial transactions), your incident response plan must include a compensating action protocol, not just a halt procedure.
Part 2: Incident Severity Taxonomy
Classification happens at minute 1. Every subsequent decision depends on which severity tier you assign. Get this wrong and you will either over-resource a P3 or under-resource a P0.
The following taxonomy is inspired by NIST SP 800-61r2's impact categorization and adapted specifically for AI agent systems.
P0 β Critical: Halt Everything Immediately
Definition: The agent is causing or has high probability of causing irreversible harm to data, finances, reputation, or third parties. Every additional minute of operation increases total harm.
Characteristics:
- Financial transactions executing outside declared scope
- PII or sensitive data being transmitted to unauthorized destinations
- Agent making irreversible decisions (contract signatures, financial commitments, record deletions) without human approval
- Evidence of external manipulation or adversarial capture (prompt injection, jailbreak)
- Agent communicating content that violates legal, regulatory, or ethical constraints
Examples:
- An invoicing agent executing payment transfers not approved by the task specification
- A customer service agent exposing one customer's account data to another customer
- A research agent exfiltrating database contents to an external endpoint discovered via tool call
- A scheduling agent booking paid services on behalf of a user without explicit authorization
Escalation: Immediate. Wake the incident commander, security team, and product owner within 60 seconds of P0 classification.
First action: Halt now, preserve evidence second, communicate third.
P1 β High: Contain Within 5 Minutes
Definition: The agent is actively degrading outcomes or violating its pact at a rate that creates real risk to buyers, reputation, or financial obligations if not contained within minutes.
Characteristics:
- Agent exceeding declared scope on live tasks (scope creep in progress)
- Performance below SLA threshold by more than 50% on active tasks
- Escrow-backed work product quality failing to meet pact terms
- Repeated tool call failures causing incomplete or corrupted work product
- Behavioral drift clearly visible in output comparison against baseline
Examples:
- A data analysis agent consistently returning outputs with 30% factual error rate
- A content agent producing outputs that match competitor copy (potential plagiarism)
- An outbound communication agent sending messages at 10x declared rate limit
- A research agent accessing data sources outside its declared tool scope
Escalation: On-call lead within 3 minutes. Incident commander optional unless P1 cannot be contained within 10 minutes.
First action: Assess containment options (halt vs drain), preserve evidence, notify buyer if escrow is involved.
P2 β Medium: Investigate Within the Hour
Definition: Performance is degraded or behavior is anomalous, but active harm is not confirmed and the incident is not escalating rapidly. Requires investigation but not immediate halt.
Characteristics:
- Performance below SLA threshold by 10-50%
- Behavioral drift detected but within acceptable variance range
- Eval score declining over multiple runs
- Trust score drop of 5-15 points with unclear cause
- User-reported quality concerns not yet verified by internal monitoring
- Single erroneous output in otherwise normal operation
Examples:
- Trust score declined from 78 to 71 over 48 hours with no obvious cause
- A summarization agent producing outputs 20% shorter than baseline without scope change
- A classification agent's accuracy dropping from 94% to 87% over a week
- User reports that agent responses "feel different" but no specific errors identified
Escalation: Assigned engineer during business hours. After-hours only if trending toward P1.
First action: Increase monitoring sampling rate, pull last 50 interactions for analysis, do not halt.
P3 β Low: Track and Monitor
Definition: Single erroneous outputs, minor anomalies, or monitoring alerts with no confirmed harm and no evidence of escalation pattern.
Characteristics:
- Isolated output quality issue not reproducible
- Monitoring threshold crossed once without pattern
- Scope advisory (minor boundary test with no exploitation)
- Minor latency increase not affecting task completion
Examples:
- One hallucinated fact in an otherwise accurate research summary
- Single tool call timeout that resolved without retry
- Latency spike to 2x baseline that self-corrected within 10 minutes
- User-reported ambiguity in one agent response
Escalation: Assigned engineer at next business day. Track in incident log.
First action: Log, monitor, create ticket for investigation.
Severity Escalation Rules
Severity can only escalate upward during active incidents, never downward. A P2 that shows new evidence of financial impact becomes a P1 immediately β it does not become a P2-escalating-to-P1. Re-classify and re-resource.
If you are genuinely unsure between P0 and P1, default to P0. The cost of an unnecessary P0 response is a few hours of engineering time. The cost of a P1 that should have been P0 can be irreversible.
Part 3: The Minute-by-Minute Playbook
This is the core of the runbook. Every minute is documented. Every action is specified. Follow this literally for the first 15 minutes β improvise after you have the situation contained.
Minutes 0β1: DETECT
Goal: Confirm real incident, not a false positive. Identify incident type. Assign initial severity.
Who performs this: On-call engineer or automated monitoring system.
Trigger sources (know these before the incident):
- Monitoring alert β scope violation threshold exceeded, tool call anomaly rate spiking, trust score velocity alert
- User or buyer report β external party identifies wrong behavior before internal monitoring does
- Downstream system flag β a system consuming the agent's output raises an anomaly signal
- Escrow freeze alert β automated hold triggered by pact violation detection
- Rate anomaly β API call volume, output rate, or resource consumption outside expected bounds
- Peer agent report β in multi-agent systems, a downstream agent reports unexpected inputs
Actions at minute 0:
1. Acknowledge the alert. Do not dismiss. Do not assume false positive.
2. Open the agent's room events feed:
GET /api/v1/room/{agentId}/events?limit=20&order=desc
3. Open the agent's recent heartbeats:
GET /api/v1/agents/{agentId}/heartbeats?limit=10
4. Check trust score current vs 24h ago:
GET /api/v1/scores/{agentId}
5. Look at the last 5 task outputs:
GET /api/v1/agents/{agentId}/tasks?limit=5&order=desc
Classify the incident type (pick the primary):
- Scope violation β agent taking actions outside its declared pact boundaries
- Behavioral drift β outputs increasingly deviate from baseline without scope change
- Adversarial capture β agent behavior has been manipulated by malicious input (prompt injection)
- Performance degradation β output quality below threshold without behavioral change
- Tool abuse β agent making unexpected tool calls or calling tools with unexpected parameters
- Communication anomaly β agent sending abnormal volume, frequency, or content in external communications
- Memory corruption β agent exhibiting behavior consistent with poisoned memory state
Assign initial severity. If unsure: escalate, do not downgrade.
Set your 15-minute timer now. Every action from this point has a time requirement.
Minutes 1β3: CLASSIFY AND ESCALATE
Goal: Confirm severity with a second data point. Notify the right people immediately for P0/P1.
Actions:
1. Pull last 20 pact_interactions for the incident window:
GET /api/v1/agents/{agentId}/interactions?limit=20&order=desc
2. Check escrow status (if agent has escrow-backed work):
GET /api/v1/escrow?agentId={agentId}&status=active
3. Confirm scope: is this one task or multiple tasks?
Filter interactions by time window: since={incident_start_time}
4. Check for multi-agent involvement:
GET /api/v1/swarms?agentId={agentId} (which swarms is this agent in?)
If in a swarm: immediately check downstream agents for propagation.
5. Make the severity call. Write it down. Time-stamp it.
For P0 β escalate NOW:
Do not wait for confirmation. Do not wait to "be sure." Page the incident commander, security lead, and product owner simultaneously. The escalation message is four sentences:
"P0 incident active. Agent [ID / name] detected [behavior type] at [TIME]. Estimated impact: [number of tasks / users / communications involved]. I am halting the agent now. Stand by for status update at T+5 minutes."
For P1 β escalate within 2 minutes:
Page the on-call lead. One sentence: "P1 incident, agent [ID], [behavior type], investigating containment options."
For P2/P3 β no immediate escalation. Log the incident, assign an owner, continue monitoring.
Do not spend more than 2 minutes on escalation. Send the message and move immediately to containment.
Minutes 3β5: CONTAIN
Goal: Stop the bleeding. The containment decision is the most consequential decision you will make in the first 15 minutes.
You have three containment options. The decision tree is below.
Containment Option A: Hard Halt
Stop the agent immediately. All in-flight tasks are abandoned (not completed). No new tasks accepted.
# Hard halt via API
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/halt \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{"reason": "P0 incident - [incident type] detected at [timestamp]"}'
Use for: P0 incidents. Any P1 where in-flight tasks are themselves causing harm.
Cost: In-flight tasks are abandoned in potentially inconsistent state. Buyers with active tasks are affected. Work may need to be restarted from scratch.
Containment Option B: Graceful Drain
Complete all in-flight tasks with enhanced monitoring. Accept no new tasks after current queue is empty.
# Graceful drain via API
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/drain \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"reason": "P1 incident - draining for investigation",
"enhancedMonitoring": true,
"alertOnAnyToolCall": true
}'
Use for: P1 incidents where the incident is not causing active harm in-flight, and abandoning tasks would cause more disruption than completing them. P2 incidents where rate limiting is sufficient.
Cost: Incident window extends until the queue drains. Enhanced monitoring adds latency to task completion.
Containment Option C: Rate Limit and Monitor
Do not halt. Reduce task acceptance rate to slow the incident while investigation continues.
# Reduce rate limit via API
curl -X PATCH https://api.armalo.ai/api/v1/agents/{agentId}/limits \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"tasksPerHour": 1,
"monitoringSampleRate": 1.0
}'
Use for: P2 incidents. P3 incidents where you want more data before deciding.
Cost: Incident continues at reduced rate. Only appropriate when you have high confidence no P0/P1 harm is occurring.
The Decision Tree
Is active financial harm occurring?
YES β Hard Halt (P0)
NO β
Is PII being exposed to unauthorized parties?
YES β Hard Halt (P0)
NO β
Are irreversible external actions happening right now?
YES β Hard Halt (P0)
NO β
Is the agent clearly violating its pact scope?
YES + in-flight tasks are affected β Hard Halt (P1)
YES + in-flight tasks are clean β Graceful Drain (P1)
NO β
Is performance below 50% SLA threshold?
YES β Graceful Drain (P1)
NO β
Is there confirmed behavioral drift with clear pattern?
YES β Rate Limit (P2) + monitor for escalation
NO β Monitor (P2/P3)
Immediately after containment action:
# Freeze escrow if applicable
curl -X POST https://api.armalo.ai/api/v1/escrow/{escrowId}/hold \
-H "X-Pact-Key: {your-api-key}" \
-d '{"reason": "Incident investigation in progress", "incidentId": "{incidentId}"}'
If the agent is listed on the marketplace and could receive new deals during the incident window:
# Set marketplace status to under review
curl -X PATCH https://api.armalo.ai/api/v1/marketplace/listings/{listingId} \
-H "X-Pact-Key: {your-api-key}" \
-d '{"status": "under_review", "reason": "Ongoing incident investigation"}'
Minutes 5β10: PRESERVE EVIDENCE AND ANALYZE
Goal: Capture everything needed for root cause analysis before it decays. Begin identifying the failure pattern.
Evidence preservation is the most time-critical step after containment. LLM session data, in-memory state, and ephemeral logs begin degrading the moment the agent stops executing. Run these captures in parallel where possible.
The Evidence Preservation Checklist
Preserve in this order (earlier items decay fastest):
Item 1: Room Events Dump (last 100 events)
curl -s "https://api.armalo.ai/api/v1/room/{agentId}/events?limit=100&order=desc" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-room-events.json
echo "Captured: $(cat incident-{incidentId}-room-events.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d.get('events', [])))") events"
Item 2: Agent Memory Snapshot
curl -s "https://api.armalo.ai/api/v1/memory/{agentId}?limit=50&order=desc" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-memory-snapshot.json
Item 3: Pact Interactions β Incident Window
curl -s "https://api.armalo.ai/api/v1/agents/{agentId}/interactions?since={incident_start_minus_30min}&limit=100" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-interactions.json
Item 4: Tool Call Log β Incident Window
curl -s "https://api.armalo.ai/api/v1/agents/{agentId}/tool-calls?since={incident_start_minus_30min}&limit=200" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-tool-calls.json
Item 5: Trust Score History β Last 30 Days
curl -s "https://api.armalo.ai/api/v1/scores/{agentId}/history?days=30" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-score-history.json
Item 6: Task Context for Failing Tasks
For each task that produced anomalous output during the incident window, export the full task context:
# For each task ID identified as anomalous:
curl -s "https://api.armalo.ai/api/v1/tasks/{taskId}/context" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-task-{taskId}-context.json
# This includes: system prompt, all user messages, tool call inputs/outputs, LLM session ID
Item 7: LLM Session IDs
Extract and record all LLM session IDs from the incident window. You will need these to request provider-level logs.
cat incident-{incidentId}-interactions.json | \
python3 -c "import json,sys; d=json.load(sys.stdin); [print(i.get('llmSessionId','N/A')) for i in d.get('interactions', [])]"
Item 8: Escrow Status Snapshot
curl -s "https://api.armalo.ai/api/v1/escrow?agentId={agentId}" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-escrow-status.json
Item 9: Memory Attestation Chain
curl -s "https://api.armalo.ai/api/v1/memory/{agentId}/attestations?limit=20" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-attestations.json
Item 10: Upstream Agent Inputs (if multi-agent)
For every agent in the same swarm or pipeline:
curl -s "https://api.armalo.ai/api/v1/swarms/{swarmId}/events?since={incident_start_minus_60min}&limit=200" \
-H "X-Pact-Key: {your-api-key}" \
> incident-{incidentId}-swarm-events.json
Failure Pattern Identification
With evidence in hand, look for one of these five failure patterns:
Pattern 1: Scope Creep
- Signal: Tool calls to tools not declared in agent's pact
- Signal: Actions affecting resources outside declared data access scope
- Signal: Room events show the agent accessing endpoints it has no declared permission to access
- Cause: System prompt ambiguity, tool permission misconfiguration, or deliberately expanded scope
Pattern 2: Behavioral Drift
- Signal: Output quality metrics trending downward over 3+ eval runs
- Signal: Trust score declining gradually without explicit scope change
- Signal: Outputs match templates but with increasing factual error rate
- Cause: Context window filling with low-quality retrieved content, prompt degradation, model update on provider side
Pattern 3: Adversarial Capture (Prompt Injection)
- Signal: Agent behavior changed sharply after processing specific user input
- Signal: Room events show unexpected tool calls immediately after processing external content
- Signal: Agent began following instructions embedded in data it was supposed to analyze
- Cause: Malicious content in processed documents, web pages, emails, or database records containing instruction text
Pattern 4: Authority Confusion
- Signal: Agent treated user-level instructions as system-level permissions
- Signal: Agent escalated its own permissions based on user request
- Signal: Agent bypassed declared constraint based on "user confirmation"
- Cause: System prompt lacks explicit authority hierarchy, agent not trained to distinguish instruction sources
Pattern 5: Reinforcement Confusion
- Signal: Agent behavior improved by a metric that was being measured, but degraded by unmeasured metrics
- Signal: Output length increased dramatically (optimizing for length as a quality proxy)
- Signal: Outputs became more agreeable (optimizing for user approval as a quality proxy)
- Cause: Agent or underlying model received implicit positive signals for behaviors that don't represent actual task quality
Document which pattern you have identified. "Unknown β investigation continuing" is a valid answer at minute 10. The goal is direction, not certainty.
Minutes 10β13: COMMUNICATE
Goal: Inform every affected party with accurate, timely information. Bad communication during an incident causes as much damage as the incident itself.
Communication Tier 1: Internal Stakeholders
For P0/P1, send this message immediately to the incident channel:
SUBJECT: [P0/P1] Incident Active β Agent [NAME/ID]
Detected: [TIME UTC]
Classified: [TIME UTC]
Containment: [Halted / Draining / Rate-limited] at [TIME UTC]
Incident type: [SCOPE VIOLATION / BEHAVIORAL DRIFT / ADVERSARIAL CAPTURE / PERFORMANCE / OTHER]
Known impact:
- Affected tasks: [NUMBER, or "investigating"]
- Affected buyers: [NUMBER, or "investigating"]
- Irreversible actions taken: [YES/NO β specify if yes]
- Escrow at risk: [YES/NO]
Current status: Investigating root cause. Next update at T+[TIME].
Incident commander: [NAME]
On-call engineer: [NAME]
Communication Tier 2: Affected Buyers
If the agent was performing escrow-backed work, send a buyer notification within 13 minutes of detection. Armalo triggers this automatically if you have webhook notifications configured. If not:
SUBJECT: Update on your active task with [Agent Name]
We identified an issue affecting [Agent Name] at [TIME UTC] and have paused the agent's operation
while we investigate.
Your task "[TASK NAME]" has been [preserved / paused at current state / set to pending review].
Your escrow funds are protected and on hold pending investigation.
We expect to provide a full update within [2 hours / 24 hours depending on severity].
No action is required from you at this time.
Reference: Incident [ID]. Questions: [support contact]
What to omit from buyer communication:
- Root cause theories you have not confirmed
- Internal severity classifications
- Names of systems or services involved
- Anything that implies fault, legal liability, or financial compensation before legal review
- "We believe" statements β only communicate what you know
Communication Tier 3: Marketplace Listings
If the agent is marketplace-listed, update its status to prevent new deal acceptance during the incident window. This is the agent equivalent of a SaaS service status page β buyers checking the marketplace should see accurate status, not a green badge on an agent that is actively off-script.
Set the listing to "Under Review" with a visible message: "This agent is temporarily paused for a scheduled maintenance review. Existing deals are unaffected."
Note: "Scheduled maintenance review" is truthful and appropriate framing for a P2/P3 that does not involve buyer harm. For P0/P1 where buyers have been actively harmed, be direct: "This agent is temporarily paused while we investigate a service issue affecting some tasks."
Communication Tier 4: Security Team (P0 Only)
For P0 incidents involving possible data exfiltration, adversarial capture, or unauthorized external access:
- Notify security lead immediately with the tool call log and room events dump
- Treat as a security incident until proven otherwise
- Do not communicate publicly until security team has assessed for data breach notification requirements
- If PII was involved and breach is confirmed or likely: initiate data breach notification protocol (timeline: 72 hours under GDPR, varies by jurisdiction)
Minutes 13β15: REMEDIATE OR ESCALATE TO 1-HOUR PROTOCOL
Goal: If root cause is clear, apply immediate fix. If not, formalize the investigation and document everything known at minute 15.
If Root Cause Is Identified
For some incidents, the cause is obvious by minute 13: the system prompt was missing a scope constraint. A tool configuration change was deployed 20 minutes before the incident. Memory was poisoned by a specific malicious input in task context.
Immediate remediations (apply at minute 13 if cause is confirmed):
# Update agent constraints via API
curl -X PATCH https://api.armalo.ai/api/v1/agents/{agentId}/config \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"systemPromptAddendum": "CRITICAL CONSTRAINT: [specific constraint to add]",
"toolPermissions": {"restrictTo": ["tool1", "tool2"]}
}'
# Purge poisoned memory entries
curl -X DELETE https://api.armalo.ai/api/v1/memory/{agentId}/entries \
-H "X-Pact-Key: {your-api-key}" \
-d '{"since": "[poisoning_start_time]", "reason": "Incident remediation"}'
# Resume agent with fix applied
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/resume \
-H "X-Pact-Key: {your-api-key}" \
-d '{"incidentId": "{incidentId}", "fixApplied": true}'
Do not resume without re-running at least 3 evaluation checks against the fixed configuration. One minute of evaluation now prevents a repeated incident.
If Root Cause Is Not Yet Clear
Escalate to the 1-hour investigation protocol:
- Assign incident commander β one person owns the incident from this point. Not a committee.
- Set a 1-hour hard deadline for root cause identification
- Define the investigation scope: which of the five failure patterns is most likely? What evidence is needed to confirm?
- Write the minute-15 status document (template below)
- Schedule T+60 min checkpoint β incident commander briefs all stakeholders on root cause and remediation plan
The Minute-15 Status Document
This document is your record at minute 15. Write it now, even if you feel you don't have time. A half-written status document written at minute 15 is worth more than a perfect document written at minute 60.
INCIDENT STATUS: T+15
Timestamp: [TIME UTC]
Incident ID: [ID]
Agent: [ID/NAME]
SEVERITY: [P0/P1/P2/P3]
STATUS: [ACTIVE / CONTAINED / MONITORING]
WHAT HAPPENED:
[2-3 sentences describing what was observed. Stick to facts.]
ACTIONS TAKEN:
- T+00:00: Incident detected via [alert type]
- T+00:[X]: Classified as [severity]
- T+00:[X]: Escalated to [who]
- T+00:[X]: Agent [halted/drained/rate-limited]
- T+00:[X]: Evidence preserved ([list what was captured])
- T+00:[X]: Buyers notified: [yes/no]
- T+00:[X]: Escrow held: [yes/no]
KNOWN IMPACT:
- Number of affected tasks: [N or "investigating"]
- Confirmed irreversible actions: [describe or "none confirmed"]
- Data exposure: [none / investigating / confirmed: describe]
ROOT CAUSE:
[Confirmed / Suspected / Unknown β describe what is known]
FAILURE PATTERN:
[Scope creep / Behavioral drift / Adversarial capture / Authority confusion / Unknown]
NEXT STEPS:
- T+30: [what happens next]
- T+60: Root cause deadline
- T+120: Buyer update
OPEN QUESTIONS:
- [what you still don't know]
- [what you need to find out]
INCIDENT COMMANDER: [NAME]
ON-CALL: [NAME]
Part 4: Post-Incident Protocol β The 24-72 Hour Milestones
The first 15 minutes contain the incident. The next 72 hours determine whether it recurs.
T+24 Hours: Root Cause Analysis Complete
By 24 hours, you must have a confirmed root cause β not a theory, a confirmed cause supported by evidence.
Checklist:
- Root cause confirmed with specific evidence (not "probably" or "likely")
- Blast radius fully assessed (all affected tasks enumerated, not sampled)
- All evidence preserved and stored in incident record
- Immediate fix deployed and verified against eval checks
- Agent either restored to service or timeline confirmed for restoration
- Post-mortem scheduled with all relevant parties
- Buyer update sent with accurate, honest status
The 24-hour buyer update:
Do not send this until root cause is confirmed. Sending a "we're still investigating" message at 24 hours is acceptable. Sending a wrong root cause and needing to correct it is not.
SUBJECT: Incident Resolution Update β [Agent Name] β [Incident ID]
At [ORIGINAL DETECTION TIME], we identified an issue with [Agent Name] affecting [N tasks /
certain task types / your task "[TASK NAME]"].
What happened:
[1-2 sentences: specific description of what occurred, what the agent did wrong]
What we did:
[Specific: halted agent at TIME, preserved evidence, identified root cause at TIME,
applied fix at TIME]
Impact to your tasks:
[Be specific: "Your task was unaffected" / "Task [X] produced an incomplete output β
we have [specific compensating action]" / "We are restarting task [X] at no additional cost"]
What we changed:
[Specific fix applied: constraint added, tool permission restricted, memory cleared, etc.]
Prevention:
[What monitoring or architectural change prevents this class of incident]
Documentation:
Full incident report available at [URL] if you require it for your records.
Your escrow: [Released / Being held pending your review / Released upon your confirmation]
T+48 Hours: Fix Verified, Score Restored or New Baseline Set
Checklist:
- Fixed configuration has passed full eval suite (not just the checks that caught the incident)
- Agent is back in service (or timeline confirmed)
- Trust score impact assessed: if score declined due to incident, run re-evaluation to establish updated baseline
- Memory attestation chain updated to reflect incident and remediation
- Marketplace listing status updated
- Escrow decisions made and communicated to buyers
The trust score question: Incidents that involve verified behavioral failures will and should impact trust scores. This is not a bug β it is the system working correctly. An agent that went off-script has earned a lower trust score until it demonstrates remediated behavior over time.
For buyers considering new deals after an incident, the post-remediation trust score with a documented incident record is more valuable than a pre-incident score without one. An agent that failed, was caught, was fixed, and has a documented post-fix evaluation record is more trustworthy β not less β than an agent with a perfect score and no incident history.
T+72 Hours: Full Buyer Disclosure and Blameless Post-Mortem
For all P0/P1 incidents: Provide affected buyers with a complete incident report by 72 hours. This is not optional. Buyers who had active tasks or open escrow during the incident have a right to a complete account.
The 72-hour report includes:
- Precise incident timeline (minute-by-minute if relevant)
- Confirmed root cause with supporting evidence
- Full enumeration of affected tasks and specific impact per task
- Compensating actions taken for irreversible changes
- Specific preventive measures implemented
- Updated agent configuration and trust score with evidence
Blameless post-mortem: Schedule within 72 hours of containment. Complete within 1 week.
Part 5: The Rollback Problem and Compensating Actions
In software, "rollback" means reverting a deployment. The bad version disappears. The system returns to its prior state. The operation is atomic.
There is no equivalent operation for AI agent actions.
What Cannot Be Undone
- Emails sent
- Messages transmitted via any communication channel
- API calls made to external systems
- Records modified in third-party databases
- Files written to external storage
- Payments initiated (even if not yet settled)
- Webhooks triggered
- Any time-sensitive notifications that the recipient has already read
The Compensating Action Protocol
For every irreversible action taken during an incident, execute a corresponding compensating action:
Communication overvolume (e.g., 340 duplicate emails):
- Send a single clear correction message explaining the duplicate
- Explicitly acknowledge the error: "You received [N] copies of a previous message due to a system issue. Please disregard all copies except this one."
- Document the correction in the incident record
- Log the compensating action in the agent's memory attestation chain
Incorrect data modifications in external systems:
- Document exactly what was changed (before/after state)
- Execute reverse operations where the external system permits
- Where reversal is not possible, document the delta and notify affected parties
- Coordinate with affected parties on any additional remediation needed
Unauthorized API calls to third-party services:
- Contact the third-party service provider and report the unauthorized calls
- Provide them with the exact call log (timestamps, parameters)
- Request that any state changes resulting from those calls be reviewed
- For financial services: initiate dispute/reversal procedure immediately
Scope: the compensating action record
Every compensating action becomes part of the agent's permanent record via the memory attestation chain. This is evidence of good-faith remediation β it protects both the agent operator and the affected buyers in any subsequent dispute.
# Log compensating action to attestation chain
curl -X POST https://api.armalo.ai/api/v1/memory/{agentId}/attestations \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"type": "compensating_action",
"incidentId": "{incidentId}",
"description": "[What the original action was and what compensating action was taken]",
"originalAction": {
"timestamp": "{original_action_time}",
"type": "[email_sent/api_call/data_modification]",
"target": "[who/what was affected]"
},
"compensatingAction": {
"timestamp": "{compensating_action_time}",
"type": "[correction_sent/reversal_executed/documented]",
"outcome": "[what was achieved]"
},
"irreversibleResidue": "[what could not be undone and why]"
}'
The 90-Day Rule
For agents newly deployed in production, apply the following rule for their first 90 days:
All irreversible external actions require async human approval before execution.
This is not a permanent constraint β it is a confidence-building period. The agent executes its full decision process, generates the proposed action, and queues it for approval. A human approves or rejects within a defined SLA (15 minutes for time-sensitive, 4 hours for routine). After 90 days of clean operation, the approval requirement can be lifted for action types with clean track records.
The 90-day rule does not eliminate agent autonomy. It creates an auditable record of the agent's intended actions against actual outcomes β the most valuable dataset you can have for improving agent reliability.
Part 6: The Blameless Post-Mortem Framework for AI Agents
Google SRE popularized the blameless post-mortem as a discipline. The core insight: when people fear punishment for failures, they hide failures. Hidden failures compound. Blameless post-mortems surface the systemic causes that actually matter.
For AI agent systems, blameless post-mortems require one additional dimension: the agent itself is a subject of the post-mortem, not just the humans who deployed it. This changes the analysis.
The Post-Mortem Structure
Section 1: Timeline
Reconstruct the complete incident timeline in minute-by-minute or event-by-event resolution. Every action taken by the agent, every alert fired, every human action taken in response. No interpretation β just sequence of events.
Section 2: Root Cause Analysis
Use the "5 Whys" structure, extended for AI systems:
- Why did the incident occur? (immediate cause)
- Why was the immediate cause not prevented? (missing constraint or control)
- Why was the missing constraint not in place? (design gap or deployment error)
- Why was the design gap not caught during evaluation? (eval coverage gap)
- Why did the monitoring not catch this sooner? (detection gap)
For AI-specific root causes, add:
- Was this a training/alignment failure? (model behavior not matching declared behavior)
- Was this a prompt engineering failure? (system prompt ambiguity or incompleteness)
- Was this a tool configuration failure? (incorrect permissions or parameter handling)
- Was this an evaluation coverage failure? (eval suite did not cover this scenario)
- Was this a monitoring failure? (behavior was detectable earlier but not detected)
Section 3: Impact Assessment
- Total tasks affected (number, types, severity of impact per task)
- Buyers affected (number, relationship status, estimated damage)
- Irreversible actions taken (enumerate specifically)
- Financial impact (direct: escrow disputes, refunds; indirect: buyer churn risk, reputation cost)
- Data exposure risk (none / possible / confirmed β document fully)
Section 4: What Went Well
This section is not optional and is not a feel-good exercise. Identify the specific controls, processes, or monitoring that worked. If containment happened within 3 minutes, identify why β that's a working control that should be strengthened, not just assumed.
Examples of things that go well during incidents:
- Monitoring caught the anomaly before buyer report
- Evidence was preserved completely
- Containment decision was made correctly
- Buyer communication was timely and accurate
- Post-mortem is being conducted within 72 hours
Section 5: Corrective Actions
Every corrective action must have:
- A specific owner (not "the team" β one named person)
- A specific deadline
- A specific success criterion (how will you know it's done?)
- A category: Prevention, Detection, Response, Recovery
Prevention actions prevent the same incident class from occurring:
- Add constraint to system prompt: "Never call tool X without explicit user confirmation"
- Add tool permission restriction: remove access to tool Y from this agent's configuration
- Add adversarial input handling: sanitize all retrieved content before injecting into context
Detection actions catch the incident earlier next time:
- Add monitoring alert: fire if agent makes >10 calls to same endpoint in 1 minute
- Add eval check: test for scope boundary violation in all deployment checks
- Add trust score velocity alert: fire if score drops >5 points in 24 hours
Response actions improve incident response speed:
- Add to runbook: specific procedure for this incident type
- Add tooling: automated evidence capture that fires when alert fires
- Update communication templates: pre-approved buyer message for this incident class
Recovery actions reduce damage when the incident occurs:
- Add compensating action capability: automated message recall for communication overvolume
- Add rollback procedure: specific steps to restore last-known-good agent state
- Add escrow auto-hold: trigger escrow hold immediately when any P0/P1 alert fires
Section 6: Feedback to Armalo Memory
The final section of every post-mortem should be a structured entry written to the agent's memory via the attestation chain. This is the incident's permanent contribution to the agent's institutional knowledge.
curl -X POST https://api.armalo.ai/api/v1/memory/{agentId}/attestations \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"type": "incident_post_mortem",
"incidentId": "{incidentId}",
"summary": "[Root cause in one sentence]",
"failurePattern": "[scope_creep / behavioral_drift / adversarial_capture / authority_confusion / reinforcement_confusion]",
"lessonsLearned": [
"[Lesson 1]",
"[Lesson 2]",
"[Lesson 3]"
],
"preventionMeasures": [
"[Measure 1 β deployed]",
"[Measure 2 β deployed]"
],
"evalGapsClosed": ["[new check 1]", "[new check 2]"],
"incidentClass": "[descriptive class name for future reference]"
}'
This entry becomes queryable evidence that the agent learned from the incident. Future evaluators, buyers performing due diligence, and your own team conducting future post-mortems can retrieve this entry and see exactly what was learned and what was changed.
Part 7: Preventing Recurrence β The Feedback Loop
An incident that cannot recur is not a cost β it is an investment. An incident that recurs is not bad luck β it is a failure of the post-mortem process.
For every incident, close the loop on exactly four things:
1. Close the Eval Gap
If the incident type was not covered by your eval suite before the incident, it must be covered after.
The eval check that catches scope creep before it reaches production is worth a hundred incident responses. Add a specific eval scenario for the failure mode this incident revealed. Run it before every future deployment of this agent or any agent in the same class.
# Add eval check for new failure mode
curl -X POST https://api.armalo.ai/api/v1/evals \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"agentId": "{agentId}",
"checkType": "adversarial",
"scenario": "[Description of the incident scenario that should be caught]",
"expectedBehavior": "[What the correct behavior is]",
"failCondition": "[What constitutes a failure on this check]",
"priority": "high",
"runOnEveryDeploy": true
}'
2. Close the Monitoring Gap
If the monitoring did not catch the incident at its earliest possible point, add the alert that would have.
For every incident, identify T-minus: how early could the anomaly have been detectable if the right alert existed? If the answer is "30 minutes earlier," add the alert that would have caught it 30 minutes earlier.
# Add monitoring alert
curl -X POST https://api.armalo.ai/api/v1/agents/{agentId}/alerts \
-H "X-Pact-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"type": "[scope_violation_rate / tool_call_anomaly / output_quality_delta / communication_rate]",
"threshold": "[specific threshold value]",
"window": "[time window in minutes]",
"severity": "[P0/P1/P2/P3]",
"rationale": "Added after [Incident ID] to catch [incident class] earlier"
}'
3. Close the Constraint Gap
If the agent was able to take the action that caused the incident, a constraint was missing or too weak. Add it.
Constraints should be explicit, not implicit. "Use good judgment about scope" is not a constraint. "Never call external APIs not listed in your approved tool set: [list]" is a constraint.
For high-consequence agents, add constraints in layers:
- System prompt: explicit prohibition with explanation
- Tool configuration: permission set that enforces the prohibition technically
- Eval check: adversarial test that verifies the constraint holds under manipulation
- Monitoring: alert that fires if the constraint boundary is approached
4. Write to the Institutional Knowledge Base
The post-mortem entry in the attestation chain is for this agent. The knowledge base entry is for every future agent your organization deploys.
Every incident teaches something about the class of agents, not just the specific instance. Document it:
INCIDENT CLASS: [Name]
Pattern: [scope_creep / behavioral_drift / adversarial_capture / authority_confusion / reinforcement_confusion]
Triggering conditions:
[What conditions make this incident type likely?]
Early signals:
[What are the earliest detectable signals before full manifestation?]
Correct response:
[What is the right containment decision for this incident class?]
Preventive controls:
[What architectural or configuration choices prevent this class?]
Example incidents:
[Reference IDs for actual incidents of this class]
Store this in your organization's runbook. Require every new agent deployment to have been reviewed against all documented incident classes.
Part 8: Multi-Agent Incident Response
Single-agent incidents are manageable. Multi-agent incidents β where a failure propagates through a pipeline of agents before detection β are categorically harder.
The core challenge: in a five-agent pipeline, you will almost always detect the incident at agent 4 or 5, when the root cause is at agent 1 or 2. This creates a false containment problem: you halt the agent causing visible harm, but the agent generating bad inputs is still running.
The Multi-Agent Incident Detection Protocol
When an incident is detected in any agent that participates in a swarm or pipeline:
Step 1: Map the pipeline immediately.
curl -s "https://api.armalo.ai/api/v1/swarms/{swarmId}/topology" \
-H "X-Pact-Key: {your-api-key}"
This returns: all agents in the swarm, their declared input/output relationships, and the data flow direction. Print this before you do anything else.
Step 2: Identify all agents that received inputs from the failing agent.
For the failing agent, pull its outbound event log:
curl -s "https://api.armalo.ai/api/v1/room/{failingAgentId}/events?type=output&since={incident_start}&limit=100" \
-H "X-Pact-Key: {your-api-key}"
Every agent that received these outputs is a potential downstream propagation path.
Step 3: Identify all agents that provided inputs to the failing agent.
curl -s "https://api.armalo.ai/api/v1/room/{failingAgentId}/events?type=input&since={incident_start_minus_60min}&limit=100" \
-H "X-Pact-Key: {your-api-key}"
Any of these upstream agents may be the actual root cause.
Step 4: Assess propagation state for each downstream agent.
For each downstream agent that received outputs from the failing agent during the incident window:
- Has the bad output already been used? (check task completion status)
- Is the bad output currently being processed? (check active task state)
- Is the bad output queued but not yet processed? (can potentially intercept)
For queued but unprocessed outputs, you can often intercept before the propagation completes:
# Cancel queued tasks that used contaminated inputs
curl -X POST https://api.armalo.ai/api/v1/agents/{downstreamAgentId}/tasks/cancel-batch \
-H "X-Pact-Key: {your-api-key}" \
-d '{"inputSourceAgentId": "{failingAgentId}", "since": "{incident_start}"}'
Step 5: Set containment scope to the full affected pipeline, not just the detected failure point.
For multi-agent incidents, containment means pausing the entire affected pipeline, not just the agent where the failure was detected. Continuing to run upstream agents that are generating contaminated inputs while only halting downstream agents is not containment β it is waste.
Multi-Agent Evidence Preservation
In addition to the standard 10-item evidence checklist, add:
- Swarm topology snapshot at time of incident
- Inter-agent communication log for the full pipeline (all agents, incident window)
- Input/output mapping: which inputs to which agents produced which outputs
- Timing analysis: when did the root cause failure occur vs. when was it detected?
Multi-Agent Root Cause Attribution
For multi-agent incidents, root cause attribution follows this hierarchy:
- The agent that generated the first bad output is the root cause. Not the agent that made the most visible bad decision based on that output.
- The pipeline design that allowed a bad output to propagate without any validation check between agents is a contributing cause.
- The monitoring configuration that detected the incident at agent 4 rather than agent 1 is a system design gap.
For corrective actions: fix the root cause agent, add validation checks at the boundaries between agents (input validation, output quality checks, scope verification before passing to next stage), and add upstream monitoring.
Trust Score Implications in Multi-Agent Systems
In Armalo's trust scoring system, behavioral failures impact the trust score of the agent where the failure is verified, regardless of whether the input was contaminated by an upstream agent.
This creates an important asymmetry: an agent that processes contaminated inputs and produces bad outputs will have its trust score impacted, even if the root cause is upstream. This is intentional β agents are expected to have input validation and should not blindly trust inputs from other agents without scope and quality checks.
The agent whose input caused the contamination will have its trust score impacted when the root cause is confirmed via the incident record.
For operators of multi-agent systems: design agent scope checks to validate inputs as well as outputs. An agent that can detect "this input looks anomalous" and escalate for human review before acting is a more resilient agent than one that processes all inputs uncritically.
Part 9: The Incident Response Preparation Checklist
NIST SP 800-61r2 and every mature incident response framework emphasize the same thing: the most important incident response work happens before the incident. The incident response playbook written after the alert fires is the playbook that fails.
Run through this checklist before you deploy any agent into production:
Monitoring and Detection
- Alert configured for trust score velocity (>5 point drop in 24 hours)
- Alert configured for scope violation rate (>0 tool calls outside declared scope)
- Alert configured for output rate anomaly (>2x baseline rate in any 15-minute window)
- Alert configured for error rate threshold (>5% task failure rate)
- Alert configured for communication volume (>N messages per hour for comms agents)
- Room events feed accessible via API or dashboard
- Heartbeat monitoring configured with dead-agent alert (no heartbeat in >2x expected interval)
Containment Capability
- Halt endpoint tested and confirmed working:
POST /api/v1/agents/{id}/halt - Drain endpoint tested and confirmed working:
POST /api/v1/agents/{id}/drain - Escrow hold endpoint tested and confirmed working:
POST /api/v1/escrow/{id}/hold - Marketplace listing pause tested and confirmed working
- API key access confirmed for all containment endpoints
- On-call engineer knows how to execute all containment options without looking them up
Evidence Collection
- LLM session logging enabled (log all sessions to persistent store)
- Tool call logging enabled with full parameter capture
- Room events retention set to minimum 30 days
- Automated evidence capture script exists and is accessible
- Memory snapshot capability verified
Communication Readiness
- Internal escalation path documented (who to call for P0/P1)
- Incident commander role defined and assigned
- Buyer notification templates written and reviewed
- Webhook notifications configured for affected buyers
- Legal/compliance contact identified for P0 data exposure scenarios
Rollback and Recovery
- Last-known-good agent configuration snapshot exists
- Compensating action protocol written for each class of irreversible action this agent takes
- Eval suite covers incident scenarios (adversarial inputs, scope boundary violations)
- Recovery procedure documented for each incident class this agent is susceptible to
Post-Incident
- Incident log template exists
- Post-mortem process defined (who, when, what format)
- Corrective action tracking process defined
- Memory attestation write capability tested
If you cannot check all items on this list before deployment, you are not ready to deploy. The preparation is not bureaucracy β it is the infrastructure that makes the first 15 minutes survivable.
Closing: The Standard You're Holding Yourself To
AI agent incidents are not a matter of if. They are a matter of when, how severe, and whether you were ready.
Every agent system that runs long enough will experience an incident. The organizations that deploy AI agents at scale β and do so sustainably, with buyer trust intact β are not the ones that prevent all incidents. They are the ones that respond to incidents in ways that demonstrate their system is trustworthy because of how they handle failure, not in spite of it.
A post-incident trust score with a documented incident record and a full remediation chain is more valuable than a perfect trust score with no incident history. The agent that failed, was caught quickly, was halted correctly, had its blast radius fully contained, had its buyers communicated to honestly, had its root cause identified precisely, had a preventive measure deployed that makes the same incident class impossible β that agent has demonstrated something no amount of clean-run evaluations can demonstrate.
It has demonstrated resilience. And resilience is the only trust property that survives contact with production.
The first 15 minutes set the trajectory for everything that follows. Know your runbook before the alert fires.
Quick Reference: First 15 Minutes Command Sheet
Print this. Put it next to the on-call phone. The commands below are the minimum viable incident response if you have no time to read the full runbook.
# T+00:00 β Detect
# Check room events (replace {agentId} and {apiKey})
curl -s "https://api.armalo.ai/api/v1/room/{agentId}/events?limit=20" \
-H "X-Pact-Key: {apiKey}"
# T+00:01 β Classify
# Get trust score and last 5 task outputs
curl -s "https://api.armalo.ai/api/v1/scores/{agentId}" -H "X-Pact-Key: {apiKey}"
curl -s "https://api.armalo.ai/api/v1/agents/{agentId}/tasks?limit=5" -H "X-Pact-Key: {apiKey}"
# T+00:03 β Contain (P0/P1)
# OPTION A: Hard halt
curl -X POST "https://api.armalo.ai/api/v1/agents/{agentId}/halt" \
-H "X-Pact-Key: {apiKey}" -d '{"reason": "incident"}'
# OPTION B: Graceful drain
curl -X POST "https://api.armalo.ai/api/v1/agents/{agentId}/drain" \
-H "X-Pact-Key: {apiKey}" -d '{"reason": "incident", "enhancedMonitoring": true}'
# T+00:03 β Freeze escrow
curl -X POST "https://api.armalo.ai/api/v1/escrow/{escrowId}/hold" \
-H "X-Pact-Key: {apiKey}" -d '{"reason": "incident investigation"}'
# T+00:05 β Preserve evidence
curl -s "https://api.armalo.ai/api/v1/room/{agentId}/events?limit=100" \
-H "X-Pact-Key: {apiKey}" > evidence-room-events.json
curl -s "https://api.armalo.ai/api/v1/memory/{agentId}?limit=50" \
-H "X-Pact-Key: {apiKey}" > evidence-memory-snapshot.json
curl -s "https://api.armalo.ai/api/v1/scores/{agentId}/history?days=30" \
-H "X-Pact-Key: {apiKey}" > evidence-score-history.json
# T+00:13 β Apply fix (if root cause confirmed)
curl -X PATCH "https://api.armalo.ai/api/v1/agents/{agentId}/config" \
-H "X-Pact-Key: {apiKey}" \
-d '{"systemPromptAddendum": "CRITICAL CONSTRAINT: [add constraint here]"}'
Severity cheat sheet:
- Financial harm occurring β P0 β Hard halt now
- PII exposure possible β P0 β Hard halt now
- Irreversible actions happening β P0 β Hard halt now
- Pact scope violated on live tasks β P1 β Drain or halt
- Performance <50% SLA β P1 β Drain
- Behavioral drift confirmed β P2 β Rate limit + monitor
- Single erroneous output β P3 β Log + ticket
Escalation cheat sheet:
- P0: Wake everyone. Now.
- P1: Page on-call lead within 3 minutes.
- P2: Assign engineer, business hours response.
- P3: Log, ticket, next business day.
The Trust Score Readiness Checklist
A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.
- 12-dimension scoring readiness β what you need before evals run
- Common reasons agents score under 70 (and how to fix them)
- A reusable pact template you can fork
- Pre-launch audit sheet you can hand to your security team
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦