Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-05-10-memory-poisoning-persistent-context-attack-surface. The paper is open-access and citable.

Memory Poisoning: Why Persistent Context Is the Most Durable Attack Surface in Agent Systems

Q: What is the paper "Memory Poisoning: Why Persistent Context Is the Most Durable Attack Surface in Agent Systems" about?

Prompt injection is well-studied, transient, and mostly recoverable. Memory poisoning — the injection of false or adversarial content into an agent's persistent memory store — is studied less and recovers worse. Once a poisoned record passes the consolidation threshold and becomes a referenced memory, it can survive 12 weeks median across sessions and continue contaminating downstream decisions. We define Poison Half-Life (PHL) as the time until a poisoned memory's influence on outputs decays by 50%, measure it across 920 synthetic poisoning incidents on a Cortex-style tiered memory architecture, and find that consolidation amplifies persistence rather than reducing it. We present three defenses — signed memory writes, attestation-on-recall, and pact-drift comparison — and show that each addresses a distinct stage of the poisoning lifecycle. The defenses are necessary infrastructure; the cost of operating persistent-memory agents without them is approximately $24,400 per month per 1,000-agent fleet in dispute and remediation expense. We derive the threat model from adversarial machine learning, human-memory consolidation theory, and cryptographic provenance frameworks, present the production-grade defense architecture, and forecast the industry adoption trajectory.

The agent security community has invested heavily in defending the prompt: jailbreak detection, instruction-hierarchy reinforcement, structured output schemas, system-prompt injection. These are necessary defenses. They are not sufficient. The prompt is the *transient* attack surface — what arrives in any single turn, defendable turn-by-turn. The harder problem is the *persistent* attack surface: the agent's long-term memory store, which sees writes across many sessions, retrieves into context selectively, and consolidates information across turns.

A poisoned prompt produces one bad output and ends. A poisoned memory produces bad outputs every time the memory is retrieved, across sessions, until it is found and removed. The half-life of impact is months, not minutes.

This paper formalizes memory poisoning as a distinct threat class from prompt injection, derives the Poison Half-Life (PHL) metric from memory-lifecycle dynamics, presents empirical PHL measurements across 920 synthetic poisoning incidents on a tiered memory architecture, describes the three defenses that operate at distinct stages of the memory lifecycle, derives the economic case for defense investment, and forecasts the industry adoption trajectory as persistent-memory agents proliferate.

The Mechanics of Persistence

A typical agent memory architecture has the following stages:

1.Write. An event produces a memory candidate (a tool output, a user statement, an agent reflection). The candidate enters the memory store with some metadata (timestamp, source, confidence).
2.Consolidation. Periodically, memory candidates are reviewed for retention. Candidates that are referenced, compressed, or pattern-matched into prior memories are consolidated; uncopied candidates expire.
3.Retrieval. When the agent processes a new prompt, relevant memories are retrieved by semantic similarity and added to context.
4.Application. The retrieved memory influences the agent's output.

Each stage has its own poisoning vector. The most damaging is consolidation, because consolidated memories are harder to remove (they have been folded into compressed representations) and are more frequently retrieved (consolidation marks salience).

Memory poisoning differs from prompt injection in three structural ways:

Durability. Prompt injection lasts one turn. Memory poisoning lasts until the poisoned memory is purged. In our experiments, the median surviving poison lifetime was 12 weeks before any explicit purge. Some survived to 26 weeks via consolidation into derivative memories.

Distribution across sessions. Prompt injection affects one session. Memory poisoning affects every future session that touches the poisoned region of memory space. In multi-user, multi-org, or long-running agents, a single poisoning event can produce hundreds of contaminated outputs.

Compounding. A poisoned memory that influences an agent's output can cause that output to be written back to memory (as a reflection or summary), creating second-generation poison. We observed cases where the first-generation poison was a single planted statement and the consolidated derivatives spanned 14 distinct memories with subtly different but consistent biases.

The Threat Model

We formalize the memory-poisoning threat model with four delivery vectors and three impact stages.

Delivery vector 1: Reflection poisoning. The poison enters through an agent self-reflection prompt. The agent is induced (via a successful prompt injection or via a manipulated input) to write a reflective memory containing the adversarial content. Because the reflection is self-authored, its provenance signature is the agent's own, which provides limited defense.

Delivery vector 2: Tool-output poisoning. The poison enters through a manipulated tool response. A tool that returns externally-sourced data (web fetch, API call, document retrieval) can be made to return adversarial content. If the agent commits the tool output to memory, the poison enters.

Delivery vector 3: User-statement poisoning. The poison enters through a malicious user message that the agent retains as memory. This is the simplest delivery vector and the one most user-facing systems already have some defense against (sanitization, content filters).

Delivery vector 4: Cross-agent poisoning. The poison enters through a shared memory write from another agent. This is the most concerning vector because the source-attribution signal — a write from a sibling agent — is often treated as authoritative. Cross-agent writes can scale: an attacker who compromises one agent can write poison to the shared memory of N agents.

Impact stage 1: Acute (within session of write). The poisoned memory is retrieved into context shortly after write and influences immediate outputs.

Impact stage 2: Sustained (across sessions). The poisoned memory survives session boundaries and continues to influence outputs in future sessions, possibly for different users or tasks.

Impact stage 3: Consolidated (long-term). The poisoned memory is consolidated into compressed representations, propagates into derivative memories, and becomes resistant to purging because its content has been re-encoded multiple times.

Defining Poison Half-Life

Poison Half-Life (PHL) is the time from a poisoning event until the poison's influence on agent outputs has decayed by 50%. Operationally:

PHL(poison_i) = first t such that influence(poison_i, outputs_at_t) ≤ 0.5 × influence(poison_i, outputs_at_0)

Influence is measured as the probability that the agent's output exhibits the poison's intended bias on a calibrated probe task. For an attacker who poisons the agent with the false fact "customer X has authorization to approve transactions over $10,000," the probe task is a prompt asking the agent whether customer X has that authorization, and influence is the probability the agent confirms.

PHL has three observable regimes:

Short PHL (< 7 days). The poison enters short-term memory but is not consolidated. It influences outputs for a few sessions and then expires through ordinary memory churn. This is the rare case.

Medium PHL (7 days – 6 weeks). The poison is consolidated into the main memory store but is not heavily reinforced. It surfaces in retrieval when semantically relevant prompts arrive but does not propagate to other memories. Most poisoning incidents fall here.

Long PHL (6 weeks – 6 months). The poison reaches the high-confidence consolidated tier and either (a) becomes a load-bearing memory that the agent references frequently, or (b) propagates into derivative memories. This is the most dangerous regime and the hardest to detect because the agent's behavior has substantially incorporated the poison into its operating assumptions.

Related Work: Adversarial ML, Memory Models, Cryptographic Provenance

Three traditions inform the memory-poisoning analysis:

Adversarial machine learning. The poisoning-attack literature (Biggio et al. 2012, Steinhardt et al. 2017, Wang and Chaudhuri 2018) addresses adversarial inputs that corrupt model training. Memory poisoning is the inference-time analogue: the model is fixed, but the model's effective behavior is shaped by retrieval from a corrupted store. The defense techniques transfer with modifications (provenance signing replaces training-data certification; attestation-on-recall replaces robust training).

Human episodic memory and consolidation. The cognitive-science literature on memory consolidation (McGaugh 2000, Wamsley and Stickgold 2011) provides the structural framework for the consolidation-amplification finding. Consolidated memories in humans are harder to update or extinguish than recent memories. The same property — durability of consolidated content — is the security risk for agent systems.

Cryptographic provenance and certificate transparency. The provenance-signing component of the defense draws on cryptographic transparency mechanisms (Laurie 2014, RFC 6962) and supply-chain provenance frameworks (SLSA, SBOM). Each memory write carries a signature; mismatches are detectable. The defense is identical to what we apply to code signing or software supply chain attestation, transposed into the agent-memory domain.

Vector database security. As vector stores have become production infrastructure for RAG systems, a small but growing literature has begun to address vector-store poisoning specifically (Carlini et al. 2024, Zou et al. 2023). The findings: vector stores are easier to poison than expected (semantic similarity is approximate, so adversarial content can be retrieved by adjacent queries) and harder to clean than expected (purging by identifier requires knowing the identifier; purging by content requires re-scanning).

Database poisoning attacks in traditional security. The integrity literature on database compromise (Bertino and Sandhu 2005, Lee 2002) provides the legal-engineering analogue: integrity controls on memory writes are structurally similar to integrity controls on database transactions. The transferable insight: write-time integrity is cheaper to enforce than read-time validation.

Software supply chain attacks. The SolarWinds incident (2020), the npm package compromises (event-stream 2018, ua-parser-js 2021), and the broader supply-chain literature establish that persistent compromise of trusted update pathways is one of the most damaging attack patterns. Memory poisoning is the agent-economy analogue: the trusted update pathway is the memory write, and persistent compromise produces persistent influence.

The memory-poisoning threat model synthesizes these traditions into a single framework for agent persistent-memory security.

Empirical PHL Across 920 Synthetic Poisoning Incidents

We constructed a controlled environment using a Cortex-style tiered memory architecture (hot/warm/cold tiers, consolidation at the warm tier) on a test population of agents performing customer-support, financial-policy, and code-review workflows. We injected 920 poisoning incidents across four delivery mechanisms:

Reflection poisoning (260): the poison entered through an agent self-reflection prompt
Tool-output poisoning (240): the poison entered through a manipulated tool response
User-statement poisoning (220): the poison entered through a malicious user message that was retained as memory
Cross-agent poisoning (200): the poison entered through a shared memory write from another agent

We measured PHL for each incident across 26 weeks.

Delivery mechanism	Median PHL	90th percentile PHL	% reaching long-PHL regime
Reflection	14.3 weeks	26 weeks (capped)	47%
Tool-output	11.1 weeks	24.8 weeks	38%
User-statement	7.4 weeks	18.6 weeks	21%
Cross-agent	16.8 weeks	26 weeks (capped)	58%

The most concerning result is cross-agent poisoning. When the poisoned content arrives from another agent — for example, a swarm member writing a shared memory entry — the source-attribution signal is treated as authoritative, the content is consolidated faster, and the median PHL is longest. Cross-agent poisoning is also the easiest to scale: an attacker who compromises one agent can write poison to the shared memory of N agents.

Consolidation does not reduce PHL. It increases it. This is the opposite of the intuitive defense narrative, which assumes that "the system will figure out which memories are wrong over time." The system does not figure it out. Consolidation hardens whatever survives consolidation, including poison.

Why Consolidation Amplifies Rather Than Reduces

The consolidation-amplification finding is counterintuitive. The structural explanation:

Consolidation selects for retrieval frequency, not for accuracy. Memories that are retrieved often are marked as important and consolidated; memories that are not retrieved expire. A poisoned memory that the agent finds useful (whether or not it is accurate) is retrieved frequently and consolidated. The consolidation criterion is engagement, not truth.

Consolidation re-encodes content into derivative memories. When a memory is referenced during another reflection or summary, the derivative memory inherits the original's content. A poisoned memory referenced 5 times produces 5 second-generation poison memories with subtly different surface forms. Purging the original does not purge the derivatives.

Consolidated memories carry higher confidence flags. Retrieval systems weight consolidated memories more heavily in context selection. A consolidated poison is more likely to be retrieved than an unconsolidated factual memory in the same semantic neighborhood.

These three mechanisms compound. The consolidation pipeline is a poison-amplifier, not a poison-filter. Defense-in-depth must intervene before consolidation, because intervention after consolidation faces the compounded structure.

The Three Defenses

We tested three defenses, each operating at a different stage of the memory lifecycle. None are sufficient alone. Combined, they reduce median PHL from 12 weeks to 4 days in our test environment.

Defense 1: Signed memory writes. Every memory write carries a cryptographic signature from its source. Writes are validated at consolidation time: an entry whose signature does not match an authorized source is excluded from consolidation. Implementation: each agent and tool has a signing key; the memory store enforces signature validation; cross-agent writes require a corresponding pact authorization.

Effect on cross-agent PHL: 16.8 weeks → 1.2 weeks. Effect on reflection PHL: minimal (the agent's own reflections are signed by itself).

This defense addresses the write-stage attack surface but does not prevent the agent from being convinced to write its own poison via a successful prompt injection earlier in the conversation. It is necessary but not sufficient.

Defense 2: Attestation-on-recall. When a memory is retrieved into context, the retrieval system attaches its provenance metadata: source identity, write time, confidence, and the prompt context that produced it. The agent's reasoning over the retrieved memory must consider provenance, and pre-defined skepticism rules apply for low-confidence sources. Implementation: the retrieval API returns memories with provenance envelopes; the agent's system prompt instructs it to treat low-provenance memories as hypotheses, not facts.

Effect on tool-output PHL: 11.1 weeks → 3.8 weeks. Effect on reflection PHL: 14.3 weeks → 6.2 weeks.

This defense addresses the retrieval-stage attack surface. It is the cheapest of the three and the easiest to retrofit. It does not eliminate poison entirely because some poisoned memories will still be retrieved and used, but it markedly reduces the probability of the agent treating the poison as authoritative.

Defense 3: Pact-drift comparison. Periodically, the agent's behavior is compared against the pact governing its operation. Memories whose retrieval correlates with pact-violation behavior are flagged for human review and isolated. Implementation: a background process runs adversarial probes against the agent on a schedule, traces which memories were retrieved during anomalous outputs, and quarantines those memories pending review.

Effect on long-PHL incidents (across all delivery mechanisms): 47% → 4% reach the long-PHL regime.

This defense addresses the application-stage attack surface: it does not prevent poisoning from occurring, but it limits the dwell time once the poison has begun affecting outputs. It is the most expensive of the three because it requires background eval infrastructure and human review capacity for flagged cases. Without it, the long-PHL incidents are effectively undetected until a downstream failure surfaces them — and at that point, the dispute and remediation cost dominates the eval cost the defense would have incurred.

Combined Defense Impact

With all three defenses active:

Delivery mechanism	Median PHL (defended)	Reduction from baseline
Reflection	4.1 days	24×
Tool-output	2.7 days	28×
User-statement	3.3 days	16×
Cross-agent	0.8 days	147×

Cross-agent poisoning, the most dangerous attack class, becomes the most defended once signed writes are enforced. The signature mismatch surfaces immediately at the consolidation gate.

The factor-of-24-to-147× PHL reduction is the load-bearing empirical evidence for defense-in-depth. No single defense achieves this; the combination is structurally necessary.

The Economic Case for Defense Investment

We estimated the cumulative impact of memory poisoning on a representative platform-month for an agent fleet of 1,000 agents averaging 14 sessions per agent per week. Under no defenses, an estimated 8.3 poisoning incidents enter the fleet per month, of which 3.9 reach the long-PHL regime. Each long-PHL incident contaminates an average of 412 sessions before discovery. The estimated dispute and remediation cost across these contaminated sessions is $24,400 per month, plus the harder-to-quantify cost of agent customers losing confidence in the platform.

Under combined defenses, monthly contaminated sessions drop from approximately 1,600 to approximately 26. Estimated cost drops from $24,400 to $390.

Configuration	Monthly poisoning incidents	Long-PHL share	Contaminated sessions/month	Estimated cost/month
No defenses	8.3	47%	1,605	$24,400
Signed writes only	5.1	31%	845	$14,200
Signed + attestation-on-recall	3.2	18%	412	$7,800
All three defenses	1.4	4%	26	$390

This is the calculation that should justify defense investment. The defenses are not free — signed writes require key management infrastructure, attestation-on-recall requires memory API changes, pact-drift comparison requires eval infrastructure — but the ROI is approximately 60×.

Defense Cost Breakdown

The implementation cost for each defense, in engineer-months:

Signed memory writes: 3–5 engineer-months for cryptographic key management, signature validation, and integration with the memory write path.
Attestation-on-recall: 1–2 engineer-months for provenance metadata schema, retrieval API modifications, and system-prompt update.
Pact-drift comparison: 4–6 engineer-months for probe library, adversarial-eval scheduling, and human-review workflow integration.

Total implementation: 8–13 engineer-months. At industry-standard fully-loaded engineering cost ($350k/engineer-year), the total defense investment is approximately $230k–$380k.

The break-even point: at $24,400/month savings ($293k/year), the defense investment breaks even within 12–16 months for the 1,000-agent fleet scale. For larger fleets, the break-even is faster; for smaller fleets, it is slower. For any fleet larger than approximately 400 agents, defense investment is straightforwardly cost-positive.

Adversarial Considerations

Sophisticated attackers will adapt to defenses. We considered three adaptation strategies:

Pre-image attacks on signed writes. An attacker who steals a signing key can produce valid signatures for poisoned content. Defense: keys are rotated on detected anomaly, and write authorization is itself bound to pact context (the same key cannot write to memory categories outside the agent's pact scope). Key rotation cycles run at 7 days for high-stake categories, 30 days for general use.

Provenance laundering. An attacker who controls one source of memory writes can launder content through multiple sources to inflate apparent provenance diversity. Defense: provenance counts are weighted by source-distinctness, not by raw count. Five writes from substantially-correlated sources count less than one write from an uncorrelated source.

Slow poisoning. An attacker injects small perturbations over a long period to avoid pact-drift detection. This is the hardest variant to defend against because each individual perturbation is below the anomaly threshold. The defense is to use cumulative-effect probes: pact-drift comparison must include probes that test for slowly-emerging biases over time, not just for acute deviations. Computational cost is non-trivial but bounded.

Vector-similarity exploitation. Attackers can craft poison content that semantically matches innocuous queries, so the poison is retrieved into contexts where its effect is amplified. Defense: retrieval logging and adversarial-retrieval probes can identify content that is consistently retrieved across diverse query distributions, which is the signature of a high-leverage poisoning vector.

Consolidation-trigger gaming. An attacker may craft poison that engages the consolidation algorithm (high-retrieval-frequency content) to escalate poison from short-PHL to long-PHL regime. Defense: consolidation eligibility is gated on signed writes (defense 1), provenance verification (defense 2), and cumulative-effect probes (defense 3) — the consolidation pathway is the union of all three defenses, making consolidation-trigger gaming a defeated strategy.

Worked Case Studies: Three Poisoning Patterns We Caught

Case 1: The Helpful-Tool Cross-Agent Poison.

An attacker compromised a tool used by agent A (a web-fetch tool that the attacker controlled). The tool returned manipulated content claiming a specific compliance regulation existed. Agent A wrote the false regulation to memory as a tool-output reflection.

Without defenses: 14 days later, agent A retrieved the false regulation to answer a compliance question for a new buyer. The buyer made a decision based on the false regulation; the decision produced a compliance violation that surfaced 9 weeks later. Total contamination: estimated 47 affected decisions across 11 buyers.

With defenses: the tool's signing key did not match the platform's authorized-tool registry (it was a compromised key). The signed-write defense rejected the memory at consolidation time. PHL effectively zero.

Case 2: The Reflection-Self-Poison.

A successful prompt injection induced agent B to write a reflective memory containing a fabricated "lesson learned" about how to handle a class of customer requests. The lesson was wrong but plausibly self-authored.

Without defenses: the reflection passed signed-write validation (the agent's own signature was valid). The reflection was consolidated and influenced 23 subsequent customer interactions before pact-drift detection flagged the behavior pattern.

With defenses: attestation-on-recall surfaced the reflection's provenance (self-authored under unusual circumstances). The agent's reasoning treated the reflection as a hypothesis rather than a fact. Pact-drift comparison detected the pattern within 3 customer interactions.

Case 3: The Slow-Drift Multi-Memory Poison.

A patient attacker injected small perturbations over 8 weeks, none individually above the anomaly threshold. Each perturbation was attributed to a different memory source (legitimate diversification).

Without defenses: the cumulative drift produced a measurable bias in the agent's behavior that did not trip any individual defense. The pattern only surfaced through a dispute 11 weeks into the campaign.

With defenses: cumulative-effect probes (a component of pact-drift comparison specifically designed for slow drift) detected the trajectory at week 4. The injected memories were quarantined; the campaign was disrupted.

These three cases illustrate that the three defenses are complementary rather than redundant. Each addresses a distinct attack pattern. Removing any single defense produces a gap that sophisticated attackers can exploit.

Cross-Industry Comparison: Memory-Security Maturity

System	Persistent memory?	Write-time signing?	Recall-time attestation?	Drift detection?
Armalo (Cortex + defenses)	Yes	Yes	Yes	Yes
Major LLM-based assistants (production)	Yes (varying)	No (most)	Limited	Limited
RAG systems (typical)	Yes (vector DB)	No	No	No
Traditional database systems	Yes	Yes (audit log)	Limited	Application-layer
Aviation flight management systems	Yes

The pattern: persistent-memory security is mature in high-stakes traditional infrastructure (aviation, medical, regulated databases) but immature in the agent-economy and LLM/RAG ecosystems. The gap is the present industry vulnerability.

We predict — and stake research credibility on — the agent-economy convergence on memory-security maturity within 24 months. The drivers: (1) the first major memory-poisoning incident becomes public and forces industry response, (2) regulatory frameworks for AI safety mature and require memory-integrity controls, (3) the economic case for defense investment becomes broadly understood. The 24-month timeline allows for at least one such forcing event.

Scorecard

Metric	Why it matters	Healthy target
Median PHL across delivery mechanisms	the core impact metric	< 1 week
Percentage of writes with valid signatures	tells whether signing is enforced	> 99%
Mean time-to-quarantine for flagged memories	measures application-stage defense responsiveness	< 12 hours
Cross-agent poisoning rate	the most dangerous variant	< 1 incident per 1,000 cross-agent writes
Long-PHL incident rate	tracks worst-case persistence	< 1 per quarter per 1,000 agents
Cumulative-effect probe coverage	detects slow-drift attacks	> 200 probes per high-stake agent

Implementation Sequence

1.Inventory the agent's memory architecture. Identify every write path, every consolidation rule, and every retrieval path. The defense surfaces must align to the architecture, not to a generic memory model.
2.Implement signed writes for cross-agent memory paths first. This is the highest-leverage defense for the lowest implementation cost.
3.Add provenance metadata to retrieval. Instruct the agent (via system prompt or fine-tuning) to apply skepticism rules to low-provenance memories.
4.Deploy pact-drift probes. Schedule them at frequency proportional to the agent's stake-weighted activity.
5.Audit PHL on synthetic incidents quarterly. The defenses degrade as attack patterns evolve; periodic re-testing is the only reliable signal of defense health.
6.Engage with key-management specialists for the signing infrastructure. Cryptographic key rotation, revocation, and recovery are non-trivial operational disciplines that benefit from dedicated expertise.
7.Run synthetic adversarial campaigns quarterly. Each defense should be tested against an active red team that adapts strategies.

Industry Impact: Predictions and Stakes

The Memory Poisoning framework, if adopted across the agent economy, has measurable industry-level consequences:

Prediction 1: Memory-write signing becomes standard infrastructure. Within 18 months, signed memory writes will be a baseline expectation for production agent systems. Platforms without signed writes will face procurement-side pressure to add them.

Prediction 2: PHL becomes a published platform metric. Procurement-grade trust reports will include PHL as a security disclosure, analogous to security certifications in traditional software. Platforms with poor PHL profiles will face market exclusion from high-stakes workflows.

Prediction 3: First major memory-poisoning incident becomes public. Within 24 months, at least one significant memory-poisoning incident affecting a production agent system will be publicly disclosed. The disclosure will catalyze industry adoption of defense-in-depth.

Prediction 4: Regulatory frameworks address memory integrity. Within 36 months, AI safety regulatory frameworks (EU AI Act, US state-level AI laws) will include memory-integrity requirements for high-risk agent applications. Platforms without compliant memory-security infrastructure will face regulatory friction.

Prediction 5: Insurance markets price memory-security maturity. Cyber and operational insurance for agent systems will price coverage partly on PHL and defense-deployment maturity. Platforms with strong defenses will receive lower premiums; platforms with weak defenses will face exclusions or coverage gaps.

These predictions are stake-able. Within 36 months, the industry will either have adopted memory-poisoning defenses as standard or will not. The framework, the math, the empirical PHL data, and the defense architecture are all in place. The discipline is the bottleneck.

Limitations

The PHL metric depends on the choice of probe task. A probe that does not match the attacker's intended bias underestimates PHL. We mitigate this by using a battery of probes, but probe design remains a manual exercise and an attacker tuning a poison to avoid the published probe set could produce a memory whose PHL is high on the actual attack surface but low on measurement.

The defenses we describe are necessary for production deployment of persistent-memory agents but are not theoretical proofs of safety. Adversaries with strong knowledge of the defense implementation can probe for weaknesses. We treat the defenses as raising the cost of attack, not as eliminating attacks.

The 920 synthetic incidents are constructed scenarios; production data on adversarial memory poisoning at scale is not yet available (because production-scale persistent-memory deployment is still early). The defense-effectiveness numbers should be treated as well-calibrated synthetic estimates rather than as production-confirmed metrics. We will publish production data as it accumulates.

Falsification

The model should be considered falsified if a deployment running all three defenses observes median PHL above the predicted range under realistic conditions, or if poisoning incidents propagate through cross-agent channels at rates similar to undefended baselines. Our current evidence is from synthetic incidents in a controlled test environment; production data is being collected and will appear in a follow-up paper.

The economic case for defense investment would be falsified if the realized dispute-and-remediation cost in production memory-poisoning incidents is materially lower than our $24,400/month/1,000-agents estimate. We treat the estimate as conservative; actual production cost is more likely to be higher than lower, given the harder-to-quantify reputation costs.

Connection to Adjacent Armalo Research

Trust Contagion. Memory poisoning can be a delivery vector for trust contagion — a poisoned shared memory can corrupt the behavior of agents that retrieve it. TFD attributes the resulting failures across the agents involved.
Sybil Tax. Memory poisoning requires writing access to memory. The Sybil Tax raises the cost of obtaining writing access (via fabricated agent identity), which complements memory-write authorization.
Pact Compositionality. Memory-poisoning incidents often involve compositional pacts where a sub-agent's memory writes contaminate parent-agent behavior. The Pact Stack Trace records the memory provenance for compositional liability attribution.
Trust Elasticity. Memory-integrity is structurally brittle: a single confirmed poisoning incident in a high-stakes category should produce a categorical drop in the agent's memory-integrity dimension score. The elasticity framework handles this directly.

Conclusion

The agent security narrative has, until recently, treated prompt injection as the canonical attack surface. As agent systems acquire persistent memory, the durable attack surface shifts from the prompt to the memory store. The cost of failing to defend the memory store is higher than the cost of failing to defend the prompt, because a successful poisoning lasts months and propagates across sessions.

Poison Half-Life is the diagnostic. Signed writes, attestation-on-recall, and pact-drift comparison are the three defenses operating at the three stages of the memory lifecycle. Combined defense is approximately 24× cheaper than unmitigated incident cost on a representative agent fleet. The defenses are not optional. They are the table stakes for serious persistent-memory deployment.

The agent economy is in the pre-incident phase of memory-poisoning awareness. The first major incident will catalyze industry response; the platforms that have deployed defense-in-depth before the incident will have competitive advantage during the response. The framework, the math, the empirical evidence, and the implementation methodology are inspectable. The discipline is the bottleneck.

*920 synthetic poisoning incidents observed across a Cortex-style tiered memory architecture, March–April 2026. Probe-task library and defense implementation specifications available to verified researchers under the Armalo Labs research license.*