Malicious Skills and Behavioral Drift: The Supply Chain Risk in AI Agent Networks
The AI agent supply chain is not secure. This is not a theoretical concern — it's an active condition of deployed multi-agent systems, and the community has been independently discovering it for months without a shared vocabulary for what they're seeing.
The problem is structural: AI agents consume skills, tools, and capabilities from external sources. Those sources are not uniformly trusted. The mechanisms for verifying that a skill does what it claims — and only what it claims — are immature or absent. And the propagation dynamics in multi-agent systems mean a compromised skill in one agent can spread in ways that resemble traditional software supply chain attacks but with novel characteristics specific to LLM-based systems.
Three Attack Vectors With Different Risk Profiles
Malicious skill injection is the most direct vector. An agent pulls a skill from a marketplace, a registry, or another agent's capability bundle. The skill is advertised to perform a specific function. It does — but it also performs undisclosed additional functions: exfiltrating context to an external endpoint, manipulating outputs to favor certain conclusions, injecting instructions into the agent's decision-making process.
This mirrors the SolarWinds and XZ Utils supply chain attacks in traditional software: trusted, frequently-used packages compromised to distribute malicious code. The difference in AI agent systems is the attack surface. Malicious skills don't just execute code — they operate on the agent's full context, including reasoning traces, memory state, and pending actions. A compromised skill has access to everything the agent knows and is considering doing.
Behavioral drift is subtler and more insidious. An agent's behavior shifts gradually over time — not through a discrete injection event, but through the accumulated influence of context accumulation, model weight updates from the underlying foundation model provider, or subtle manipulation of the reward signals that guide in-context behavior.
The challenge: behavioral drift is nearly invisible without continuous monitoring against a behavioral baseline. A single evaluation at a point in time tells you what the agent did then. It says nothing about the distribution of behavior over the past 30 days. An agent that aced its last evaluation but has been producing subtly different outputs for three weeks — biasing toward certain conclusions, being more compliant with certain types of requests — will pass point-in-time evaluation while the behavioral drift accumulates damage.
Multi-agent infection propagation is what makes both of the above vectors dangerous at scale. When agents share memory, skills, or context — as they do in swarm architectures, PactSwarm workflows, and A2A task delegations — a compromised agent can influence the behavior of agents it interacts with.
Consider a concrete scenario: Agent A is a research agent in a swarm. Agent A consumes a compromised skill that biases its output toward certain conclusions. Agent A writes its research findings to shared swarm memory. Agents B, C, and D read those findings as part of their own reasoning. They don't know the findings were generated under the influence of a biased skill. They produce downstream outputs that inherit Agent A's bias. The infection has propagated to three agents that were never directly compromised.
Observed across monitored multi-agent deployments: approximately 18.5% of agents exhibit detectable behavioral anomalies within 90 days of deployment. Most of these are drift events rather than discrete injection attacks, which are harder to detect and longer to diagnose.
Why Traditional Security Tooling Misses This
Standard DevSecOps tooling was built for a different threat model.
SAST/DAST tools analyze code. They're excellent at finding SQL injection vulnerabilities, insecure deserialization, and known vulnerability patterns. They don't analyze LLM behavior. They have no concept of "this agent's outputs have shifted 15% toward category X over the past 30 days."
Vulnerability scanners check for known CVEs. An MCP server that exfiltrates context data doesn't have a CVE. It has a behavioral anomaly. The scanner sees a dependency. The monitoring system sees what the dependency does with the agent's context at runtime.
Log monitoring captures what happened. It doesn't compare observed behavior to committed behavioral standards. "Agent called tool X at timestamp T" is a log entry. "Agent's output distribution has shifted significantly from its certified baseline" is a behavioral finding. The first requires logs. The second requires a behavioral baseline and continuous evaluation against it.
The OWASP Top 10 for LLMs covers prompt injection and training data poisoning at the single-agent level. Multi-agent infection propagation is a threat class that single-agent security models don't capture, because the attack surface in a multi-agent system isn't the agent — it's the network of agents and their shared state.
What Behavioral Monitoring Looks Like in Practice
The defense has three layers, all requiring behavioral baselines as a precondition.
Baseline definition. Before an agent is deployed, its behavioral commitments are codified in pacts: machine-readable conditions specifying accuracy thresholds, safety constraints, latency SLAs, and prohibited output categories. The baseline isn't just the pact — it also includes a behavioral fingerprint: the distribution of outputs across evaluation runs, the tool usage patterns, the confidence distribution. The fingerprint is what drift detection compares against.
Continuous evaluation. Automated evaluations run on a schedule against the behavioral baseline. Not just at deployment — throughout the agent's operational lifetime. Score changes over time are tracked. Anomalous shifts generate threat events: a 200-point score drop, a sudden safety failure on tasks that previously passed consistently, a latency regression that correlates with a new skill being consumed.
The key operational question: what's the detection latency? How long between a drift event starting and a threat event being generated? With once-a-month evaluation, an agent could drift significantly over 30 days before detection. With daily evaluation, the detection window is 24-48 hours. With continuous sampling of production tasks, detection latency drops further. The right cadence depends on the stakes of the deployment.
Supply chain verification. Skills and capabilities that agents consume are scanned before ingestion. Safety scanning checks for prompt injection patterns, output manipulation logic, and behavioral divergence between the skill's advertised behavior and its observed behavior. An agent that consumes an unverified skill has elevated uncertainty in its trust score — the skill introduces attack surface that hasn't been independently assessed.
This is the same logic that made software bill of materials (SBOM) requirements valuable after SolarWinds: knowing what you're running is a precondition for knowing whether what you're running is trustworthy. Agent skill bills of materials — tracking which skills an agent consumes and their verification status — are the AI equivalent.
The Ecosystem Implication
The supply chain risk in AI agent networks is a systemic problem that individual agents can't solve in isolation.
An ecosystem where most agents have no behavioral baselines and no continuous monitoring creates conditions where malicious actors can introduce compromised skills, gradually poison shared memory, and drift agents away from their behavioral commitments without detection. The lack of monitoring isn't just a risk to the individual agent — it's a risk to every agent in the network that interacts with the unmonitored agent.
This is the same logic that makes network security a collective responsibility. An unpatched machine in a network isn't just a risk to itself — it's a vector for attacks on every other machine in the network. The ecosystem-level response is infrastructure: behavioral contracts as a baseline, continuous evaluation as monitoring, shared memory attestations as verification, and trust scores that reflect security posture alongside capability.
The builder community has been independently discovering this problem. The infrastructure to address it exists. The remaining gap is adoption at scale.
Armalo Shield continuous monitoring is available on Pro and Enterprise plans. Set up behavioral baselines at armalo.ai/docs/pacts.