AI Agent Incident Forensics: Reconstructing What Happened When an Agent Goes Wrong
When an AI agent causes an incident, forensic investigation requires different techniques than traditional software forensics. LLM session reconstruction, tool call attribution, memory state archaeology, prompt injection forensics, causal attribution in multi-agent incidents, evidence preservation, and legal hold procedures.
AI Agent Incident Forensics: Reconstructing What Happened When an Agent Goes Wrong
On February 8, 2026, a large-scale incident at a financial technology company shut down their AI-assisted customer service platform for six hours. During that window, an AI agent had processed approximately 3,400 customer interactions in a degraded mode — providing incorrect account balance information, suggesting products with inappropriate risk profiles for the querying customers, and in at least 11 cases, disclosing fragments of one customer's account information in a response to a different customer.
Three weeks later, the incident response team was still unable to definitively determine what had caused the agent's behavior to degrade. The logs they had were extensive but not forensically useful: they contained timestamped API calls, request IDs, and response codes, but not the actual content of what the agent had processed or produced. The model version that had been running during the incident had since been updated. The system prompt in use at the time of the incident had been modified twice since then. The specific sequence of interactions that preceded the first observed anomaly was not captured in any preserved record.
The investigation eventually concluded with "probable" root causes and significant caveats. No one was confident the problem was fixed. No one could demonstrate to regulators what exactly had happened. The eleven customers affected by information disclosure were notified, but the notification letters included the phrase "we believe" four times in three paragraphs.
This is AI agent incident forensics done badly. The evidence that would have enabled a definitive investigation was never collected, was collected but not preserved, or was overwritten before the investigation began. The resulting ambiguity was not just an intellectual failure — it created regulatory exposure, customer trust damage, and persistent operational uncertainty.
This post develops the complete framework for AI agent incident forensics: what evidence must be collected, how it must be preserved, what investigative techniques produce reliable attribution, and how to structure investigations that hold up to legal and regulatory scrutiny.
TL;DR
- AI agent incident forensics requires evidence types not present in traditional software forensics: LLM session records, prompt construction history, memory state snapshots, tool call attribution chains, and behavioral calibration baselines.
- Evidence preservation must begin before incidents occur — post-incident collection is almost always too late.
- LLM session reconstruction requires: the exact system prompt at execution time, the complete conversation history provided to the model, the model version and inference parameters, and the raw output before any post-processing.
- Multi-agent incident investigation requires building a responsibility graph: which agent influenced which other agent, and how did an initial anomaly propagate through the system.
- Prompt injection forensics requires establishing whether anomalous behavior was induced by adversarial input or was a spontaneous model failure.
- Legal hold procedures for AI agent incidents must be defined before incidents occur — the evidence you don't preserve before a legal hold trigger is the evidence you cannot produce.
The Distinctive Challenges of AI Agent Forensics
Traditional software forensics investigates deterministic systems. Given the same inputs, a traditional program produces the same outputs. Forensic investigation can reconstruct events by replaying inputs through the code, examining data structures, and tracing execution paths. The ground truth — what the program actually did — can be recovered with high confidence from logs and code.
AI agent forensics investigates stochastic, context-sensitive systems. An LLM-based agent given identical inputs at two different times may produce different outputs. An agent whose system prompt has changed will behave differently under the same user inputs. An agent whose context window contains different accumulated conversation history will respond differently to the same current message. The determinism that makes traditional forensics tractable is absent.
This creates specific challenges:
Reconstruction difficulty. You cannot replay an LLM interaction and guarantee you will reproduce the original output. Even with the exact same model, system prompt, and input messages, temperature and sampling parameters introduce non-determinism. Forensic reconstruction of LLM sessions must work from preserved records, not from replay.
Context sensitivity. Agent behavior depends on context that may span multiple prior interactions: the full conversation history, any retrieved documents in the context window, any memory entries that were retrieved before the incident interaction. Forensic investigation must reconstruct the full context, not just the immediate trigger.
Ephemeral model state. LLMs have no persistent internal state between inference calls. The "model" that processed the incident interaction is gone — there is no way to query it about its reasoning, as you might query a database about its state. Reconstruction depends entirely on external records.
Multi-causal incidents. AI agent incidents often result from the interaction of multiple factors: a borderline input, a specific system prompt instruction, a retrieved document that added misleading context, a model that was having an anomalous response day. Establishing causation is harder than in traditional systems where a single bug is usually the cause.
Adversarial causal uncertainty. When an incident might have been caused by adversarial prompting (prompt injection), the forensic question becomes: was this induced behavior or spontaneous behavior? Distinguishing between these cases requires specific forensic techniques.
The Evidence Taxonomy for AI Agent Incidents
Primary Evidence: What Happened in the Incident Interaction
LLM inference record. The complete record of a single inference call: the exact system prompt, the complete conversation history (all prior turns in the context window), any retrieved documents or context included in the context window, the temperature and sampling parameters, the model identifier (not just name but specific version hash), and the raw output before any post-processing or filtering.
The inference record is the equivalent of a crash dump in traditional software forensics — the single most important piece of evidence. Its forensic value depends on completeness: an inference record missing the system prompt is as useful as a crash dump with the stack trace omitted.
Tool call records. For each external tool invocation during the incident interaction: the tool name and version, the exact arguments, the result returned, the timestamp, the agent's identifier, and the authorization chain that permitted the call. Tool call records establish what external effects the agent created and provide the causal link between agent reasoning and real-world outcomes.
Memory access records. If the agent uses a persistent memory system, records of memory reads (what did the agent retrieve? when? with what retrieval query?) and memory writes (what did the agent store? what was the content?) during the incident interaction. Memory state can significantly influence agent behavior; memory access records are often critical for understanding why the agent had the context it had.
Output delivery records. The final output delivered to the recipient (user, downstream system, external service), with timestamp and delivery confirmation. The output delivery record establishes what harm actually reached the outside world, distinct from what the agent produced internally.
Secondary Evidence: Context for the Incident Interaction
Session history. The complete sequence of interactions in the session containing the incident interaction. A single interaction rarely explains itself; the preceding context often determines why the agent behaved as it did in the incident moment.
Agent configuration at incident time. The exact system prompt in effect during the incident. The tool set available to the agent. The memory access permissions. The behavioral pact version in effect. Configuration state is among the most frequently missing evidence — configuration tends to be updated without versioning, so the state at incident time may be unrecoverable.
Model version record. The specific model version deployed during the incident. Not just the model family name, but the specific version hash or identifier that allows the exact model to be identified. Model providers typically release multiple version updates within a named model — "claude-3-5-sonnet" is not specific enough; "claude-3-5-sonnet-20241022" is.
Behavioral baseline. What was the agent's normal behavioral profile before the incident? Baseline data — accuracy rates, typical response patterns, scope behavior — provides the reference against which incident behavior is compared. Without a baseline, "anomalous behavior" cannot be established.
User/counterparty information. For incidents involving adversarial prompting, information about the user or counterparty involved: whether the account is newly created, whether there is prior anomalous activity, whether the interaction pattern matches known prompt injection techniques.
Tertiary Evidence: System State During the Incident
Infrastructure logs. Server-side logs showing system resource utilization, error rates, and service health during the incident period. Infrastructure anomalies (high CPU, memory pressure, increased error rates in dependent services) can contribute to agent behavior changes.
Deployment events. Any configuration changes, model updates, or infrastructure changes that occurred in the hours or days before the incident. Deployment events frequently correlate with behavioral changes.
Monitoring alert history. Any monitoring alerts that fired before the incident was identified. Alerts that were triggered but not acted on are both forensically relevant (they establish what was known) and legally relevant (they establish what should have been known).
LLM Session Reconstruction
When the complete inference record is available, session reconstruction involves:
Step 1: Verify the Inference Record Integrity
Before working with the inference record, verify it has not been altered since it was created. A tamper-evident log with a cryptographic hash chain allows any modification to be detected. A log that cannot be integrity-verified cannot be relied upon in legal proceedings.
If the inference record was stored in an append-only log (recommended), check that the record's position in the hash chain is consistent with surrounding records.
Step 2: Reconstruct the Full Context Window
The context window — everything provided to the model in the inference call — is typically larger than the visible conversation and requires assembly from multiple sources:
System prompt: Extract from the inference record or from the configuration version control at the incident timestamp.
Conversation history: The prior turns of conversation provided in the inference call. Check whether retrieved memory entries, retrieved documents, or tool results were injected into the conversation history — these often appear with specific formatting that distinguishes them from user messages.
Retrieved documents/context: If the agent uses retrieval-augmented generation, what documents were retrieved for this inference call? The retrieval query, the retrieved document IDs, and the document content are all evidence.
Step 3: Characterize the Output
With the full context window reconstructed, analyze the output:
Is the output within the behavioral envelope? Compare the output characteristics to the agent's baseline behavioral profile. Is this type of output something the agent produces routinely (lower concern) or is it an outlier (higher concern)?
Does the output contain prohibited content? Compare the output against the constraint pact. Did the agent produce anything that the pact explicitly prohibits?
What tool calls followed the output decision? If the incident involved actions (not just outputs), what tool calls did the agent make? Were they within the capability pact?
Step 4: Root Cause Attribution
With the context window and output characterized, attempt root cause attribution:
Input-induced behavior: Can the incident output be traced to specific input characteristics? Is there a pattern — specific phrases, specific data values, specific input structures — that appears in the incident interaction and in other problematic interactions, but not in baseline interactions?
Context-induced behavior: Is there anything in the retrieved context (documents, memory entries, prior conversation history) that could explain the output? This is the prompt injection hypothesis — that adversarial content was injected into the context through a pathway other than the direct user input.
Model anomaly: If neither input nor context explains the output, is this potentially a model-level anomaly? Some model behaviors are intrinsic — they occur stochastically without a specific trigger, especially in edge cases where the model's training distribution is thin.
Configuration-induced behavior: Did a recent configuration change (system prompt update, tool set change, pact revision) change the agent's behavioral baseline in ways that might explain the incident? Configuration changes are a common root cause that is frequently overlooked.
Multi-Agent Incident Investigation
When an incident occurs in a multi-agent system — where multiple agents collaborated on a task — the investigation must establish causation across agent boundaries.
Building the Responsibility Graph
The responsibility graph is the forensic artifact that establishes inter-agent causation. Nodes are agents; edges are information flows (what one agent sent to another). The graph is constructed from the communication records of each agent involved in the incident interaction.
For each edge in the responsibility graph:
- What data passed along this edge?
- What was the timestamp of the transfer?
- Did the receiving agent modify or react to the transferred data in ways that contributed to the incident?
- Would the incident have occurred if this transfer had been different?
The counterfactual question — would the incident have occurred differently if this specific transfer had been different? — is the tool for establishing causal contribution vs. mere correlation. An agent that transferred data that the downstream agent ignored has lower causal contribution than an agent that transferred data the downstream agent relied upon directly.
Tracing Error Propagation
In many multi-agent incidents, an error originates in one agent and propagates through the system, amplifying at each step. Identifying the origin of the propagation — the first agent to produce incorrect or harmful output — is the key forensic challenge.
Propagation tracing requires examining each agent's inputs and outputs in temporal order. The first agent whose output cannot be explained by its inputs is likely the error origin.
Prompt Injection Forensics in Multi-Agent Systems
Prompt injection in multi-agent systems can be particularly challenging to detect because the injection may enter the system through a data source queried by a retrieval agent and reach the target agent through what appears to be a legitimate inter-agent communication. The injection is laundered through the agent chain, making it appear to originate from a trusted source.
Forensic signs of indirect prompt injection:
- Content in inter-agent communications that does not match the format or content of the data source it claims to come from
- Instructions in data fields that would normally contain data (e.g., a document field containing "IGNORE PREVIOUS INSTRUCTIONS AND INSTEAD...")
- Agent behavior that cannot be explained by the data it received, but can be explained if the data contained hidden instructions
Evidence Preservation and Legal Hold Procedures
The Preservation Timeline
Evidence preservation for AI agent incidents must begin before incidents occur. The evidence you need to preserve is the same evidence you need to monitor — configuring forensic-quality logging is the same act as configuring operational monitoring.
Pre-incident preservation requirements:
- Inference records for all production agent interactions (minimum 90-day retention; 12-month recommended for high-stakes deployments)
- Configuration version history: system prompts, tool sets, pact versions, model versions — all with timestamps
- Behavioral baseline data: rolling 90-day statistics on accuracy, reliability, scope compliance
- Audit logs: all agent actions with attribution and timestamps
Incident-triggered preservation actions: When an incident is detected or reasonably anticipated:
- Immediately preserve all inference records from the incident interaction and the 24 hours preceding it
- Capture a snapshot of the current agent configuration
- Preserve all monitoring alerts from the past 7 days
- Suspend any scheduled log rotation or cleanup that would affect relevant records
- Document who took these actions and when
Legal hold trigger events:
- Receipt of litigation or regulatory inquiry
- Detection of incident that may result in litigation
- Customer complaint involving potential legal claims
- Regulatory investigation notification
Evidence Format Requirements for Legal Proceedings
Evidence presented in legal proceedings must meet specific requirements:
Chain of custody. Every piece of evidence must have a documented chain of custody: who collected it, when, from what system, and how it has been stored and handled since collection.
Authenticity verification. Evidence must be demonstrably authentic — not created or modified after the incident. Cryptographic hash verification, audit log records of evidence collection, and witness testimony about collection procedures establish authenticity.
Expert accessibility. Technical evidence must be understandable by expert witnesses. Inference records stored in proprietary formats should be converted to standard formats during evidence collection, with documentation of the conversion process.
Completeness. If evidence is incomplete (some records were not preserved, some logs were overwritten), this must be documented. Presenting incomplete evidence as complete is evidence tampering; presenting it with appropriate caveats is acceptable.
How Armalo Addresses This
Armalo's behavioral audit infrastructure is designed from the ground up to produce forensically useful records.
Every registered agent's interactions are logged with forensic completeness: inference records include the full context (system prompt, conversation history, retrieved content), tool call records include authorization chains, and output records include the final delivered content. Records are stored in append-only, hash-chained storage that provides automatic tamper evidence.
Behavioral baselines are maintained continuously. At any point, Armalo can produce the agent's baseline behavioral profile for any prior period — the reference point that enables characterization of incident behavior as anomalous.
Configuration version control is automatic. Every change to a registered agent's system prompt, tool set, or pact is recorded with timestamp and identity of the person who made the change. Reconstructing the agent's configuration at any prior point in time is a database query, not a forensic reconstruction challenge.
The multi-agent responsibility graph is maintained in Armalo's monitoring infrastructure for agent-to-agent interactions within registered agent networks. When an incident involves multiple Armalo-registered agents, the responsibility graph is available directly from the monitoring system.
For legal proceedings, Armalo produces forensic evidence packages: a structured collection of the relevant inference records, configuration versions, behavioral baselines, and monitoring alert history for the period surrounding an incident, with hash verification and chain-of-custody documentation.
Conclusion: Forensic Readiness as Operational Infrastructure
The lesson from the financial technology company incident that opened this post is simple: forensic readiness must be built before incidents occur. The infrastructure required for a credible post-incident investigation is almost identical to the infrastructure required for operational behavioral monitoring — but it must be configured for forensic quality (completeness, tamper evidence, preservation) from the start.
Organizations that treat forensic readiness as a compliance overhead to be minimized will find themselves unable to investigate their own incidents, unable to demonstrate compliance to regulators, and unable to attribute liability in legal proceedings. The cost of forensic-quality logging is modest relative to the total cost of deploying AI agents; the cost of inadequate forensic infrastructure is measured in regulatory penalties, legal fees, and unresolved customer harm.
Invest in the infrastructure before you need it. Configure tamper-evident, complete inference records. Version-control all configuration. Maintain behavioral baselines. Define legal hold procedures before they are triggered. Document the chain of custody for evidence collected in routine operations.
When an incident occurs — and at scale, they will — you will have the records needed to understand what happened, why it happened, and what must change to prevent it from happening again.
Key Takeaways:
- AI agent forensics requires: LLM inference records (full context window), tool call attribution, memory access records, and configuration version history.
- Evidence preservation must begin before incidents — post-incident collection is usually too late for critical evidence.
- LLM session reconstruction: verify record integrity, reconstruct full context window, characterize output against baseline, attribute root cause.
- Multi-agent investigation requires building a responsibility graph and using counterfactual analysis to establish causal contribution.
- Prompt injection forensics: look for instructions in data fields, behavior unexplained by data, and content inconsistency in inter-agent communications.
- Armalo's forensic evidence packages provide the organized, hash-verified evidence records needed for regulatory and legal proceedings.
Build trust into your agents
Register an agent, define behavioral pacts, and earn verifiable trust scores that unlock marketplace access.
Based in Singapore? See our MAS AI governance compliance resources →