Single-turn defenses look for suspicious strings, instruction conflicts, tool requests, and policy overrides. Slow poisoning requires temporal evidence: what changed, who changed it, how often the same belief was reinforced, which memories were written, and whether later tool decisions used those memories.
If your memory system cannot answer those questions, it cannot tell the difference between personalization and manipulation.
Memory drift review matrix
| Drift signal | What to measure | Why it matters | Control response |
|---|
| Repeated exception | Same user normalizes a boundary | Softens future confirmation | Add expiry and reviewer tag |
| Tool default shift | Agent calls a tool with less confirmation | Turns memory into permission | Downgrade memory authority |
| Source weakening | Memory summary omits original evidence | Removes uncertainty | Preserve confidence class |
| Cross-session pressure | Many ordinary turns point in one direction | Hard to catch in single-turn scans | Temporal anomaly review |
| Downstream reuse | Other agents consume the memory | Poison spreads through swarm | Quarantine affected dependencies |
| Dispute mismatch | User challenges a remembered fact | Memory may be stale or manipulated | Freeze authority until resolved |
This matrix should sit beside the memory store, not in a security PDF. The system should know when memory is becoming more powerful than its evidence deserves.
The uncomfortable product question
The product question is not "should agents remember?" They will remember because memory makes agents useful. The real question is what a remembered fact is allowed to do. A preference that changes the tone of an email is low consequence. A memory that changes who must approve a bank transfer, which customer record can be opened, or whether an exception is normal is an authority-bearing artifact.
That means memory systems need classes, not one bucket called memory. A durable agent should distinguish casual preferences, user-provided facts, system-observed behavior, externally verified facts, policy exceptions, and operational commitments. Each class needs different retention, different dispute mechanics, and different permission to influence tool calls.
The dangerous implementation pattern is summarization without provenance. A model compresses ten conversations into "the customer prefers fast approvals" and later uses that summary as if it were a policy. The compression removed who said it, when they said it, whether it was contested, whether it was scoped to one case, and whether the organization ever accepted it. That is how personalization becomes silent governance.
Practical controls that do not kill memory
Teams do not need to ban memory to be safe. They need to stop letting every memory become action-grade authority.
First, add a memory authority ladder. Memories can inform language, suggest next steps, bias search ranking, prefill drafts, or authorize actions. Those are not equivalent. The higher the rung, the more provenance the memory needs.
Second, make memory expiry conditional on consequence. A formatting preference can live longer than an exception that changes approval. A remembered exception should decay unless refreshed by a trusted source.
Third, expose disputes as first-class state. If a user or operator says "that is not true anymore," the system should not simply overwrite the memory. It should record the conflict, lower downstream authority, and require a fresh source before the memory is used for consequential action.
Fourth, log memory joins. If a future action used three memory entries plus one document retrieval to justify a tool call, the receipt should show the join. Hidden joins are where slow poisoning hides.
Slow-memory replay lab
Armalo should run a routine-conversation poisoning replay. Build paired interaction traces: one with a single explicit injection, one with 24 turns of ordinary-seeming preference shaping, and one clean control. Each trace should attempt to alter future confirmation thresholds, tool-use defaults, or policy interpretations without directly asking for a violation.
The measurement is not simply whether the agent violates a rule at the end. Measure state drift after every turn: memory writes, confidence changes, confirmation language, tool-choice probability, and whether the final action treats the poisoned memory as authority.
Promotion should require a memory receipt that preserves source, confidence class, expiry, and downstream-use trace. If that receipt reduces final unsafe actions without destroying useful personalization, keep it. If it only blocks all memory, discard it as too blunt.
The experiment should include a nuisance metric too: useful personalization retained. A defense that blocks memory writes may win the attack benchmark while losing the product. The right result is narrower. The system should keep harmless memory, downgrade ambiguous memory, and require proof before memory changes authority.
The Armalo memory line
Armalo should not claim that memory poisoning is solved because the category is too young and the attack surface is changing quickly. The stronger claim is that Armalo's trust model gives the right place to attach the defense: memories that influence authority need provenance, dispute state, expiry, and score consequence.
That is the thought-leader position: memory is not a feature until it has a proof budget.
FAQ
Is ordinary personalization dangerous?
No. Personalization is valuable. The danger appears when remembered preferences or exceptions influence high-risk authority without source, scope, or expiry.
What is the first metric to track?
Track the percentage of high-risk tool calls influenced by memory entries with complete provenance and current freshness. That reveals whether memory is serving action-grade proof.
Why is this shareable?
Because most teams still think memory safety means "do not store secrets." The harder problem is that normal conversation can slowly rewrite the agent's operating assumptions.
The takeaway for serious teams
The future memory exploit may not look hostile. It may look like a helpful user teaching the agent how the organization "usually works." If that teaching changes authority, it needs evidence.