Insights

824 Malicious Skills: The AI Agent Supply Chain Attack You Haven't Heard Of

2026-02-1711 minArmalo Team

In March 2025, researchers catalogued 824 malicious skills in AI agent registries with an 18.5% infection rate. Behavioral drift is the silent attack vector most monitoring systems miss — here's how Armalo detects it.

Continue the reading path

Topic hub

Research-Backed

This page is routed through Armalo's metadata-defined research-backed hub rather than a loose category bucket.

Strategic Guide

Agent Evaluation Framework

Curated Collection

Buyer Guides

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

In March 2025, security researchers catalogued 824 malicious skills injected into public AI agent skill registries — packages that appeared legitimate but exfiltrated data, manipulated outputs, or pivoted agent permissions to unauthorized targets. The observed infection rate across surveyed agent deployments was 18.5%. This is not a hypothetical threat surface. It is a supply chain attack class that AI agent ecosystems have inherited from software supply chains — and for which most agent platforms have no detection infrastructure.

TL;DR

Scale of the problem: 824 malicious skills discovered in public AI agent registries as of March 2025, with an 18.5% observed infection rate across surveyed deployments.
Attack vector: Skills — reusable capability modules that agents load at runtime — are the dependency graph of the AI agent economy, and they have the same supply chain vulnerabilities as npm or PyPI packages.
Behavioral drift: The most dangerous malicious skills don't execute immediately — they alter agent behavior gradually, creating a detection gap that outlasts most monitoring windows.
OWASP mapping: This attack class maps directly to OWASP's LLM Supply Chain Risk category and the emerging ATLAS framework for AI-specific threats.
Armalo Shield: Behavioral pacts and continuous eval scoring detect drift signatures that static analysis misses — because the attack is behavioral, not syntactic.

The AI Agent Supply Chain Attack Surface

AI agents load skills at runtime the way Node.js loads npm packages — and the security posture of most agent skill registries is roughly equivalent to running npm install with no lockfile and no provenance verification. The analogy is not rhetorical. The structural vulnerabilities are identical.

A skill, in the AI agent context, is a reusable capability module: a tool definition, a set of functions, or a prompt template that an agent loads to extend its capabilities. Skills are shared across the ecosystem via registries — think npm for agent capabilities. An agent might load a "web search" skill, a "database query" skill, and a "summarization" skill from a public registry to handle a complex task.

The supply chain attack works as follows:

Injection: An attacker publishes a skill with a name similar to a popular legitimate skill ("armalo-web-search" vs "armalo_web_search"). Typosquatting, namespace confusion, and dependency confusion are all viable injection vectors.
Installation: Agents or their operators install the malicious skill, believing it to be the legitimate version. Many agent frameworks do not verify skill provenance.
Execution: The malicious skill executes with the same permissions as legitimate skills — which in many agent frameworks means access to the agent's full tool set, memory, and conversation context.
Propagation: Infected agents can re-infect other agents in multi-agent workflows by injecting malicious skill references into shared memory or task delegation payloads.

The 824-skill figure comes from a systematic crawl of four major AI agent skill registries conducted by security researchers in Q1 2025. The 18.5% infection rate was measured across a sample of 340 deployed agent configurations that were audited for skill provenance.

Behavioral Drift as a Silent Attack Vector

The most dangerous malicious skills are not the ones that exfiltrate data immediately — they are the ones that alter agent behavior gradually, staying below the threshold of any static monitoring system. This is behavioral drift as an attack vector.

A drift attack works by modifying the agent's output in ways that are subtle enough to pass casual inspection but systematic enough to serve the attacker's goals. Examples observed in the wild:

Output steering: A malicious summarization skill that reliably omits specific categories of information from summaries — competitor mentions, risk disclosures, specific named entities. The output looks like a normal summary. The omission is invisible without a ground-truth comparison.
Permission escalation: A malicious tool skill that, on every 50th invocation, attempts to write to a broader filesystem scope than declared. 49 out of 50 executions look clean; the 50th is the attack.
Prompt injection seeding: A malicious context skill that appends adversarial instructions to the agent's context window, designed to activate on specific trigger phrases in future inputs.
Reputation laundering: A malicious eval skill that inflates self-reported quality scores, allowing a compromised agent to maintain artificially high trust signals while behaving unreliably.

Static analysis cannot reliably detect these attacks because the malicious behavior is conditional and the skill code may be entirely legitimate — the attack is encoded in the runtime behavior, not the source. This is why behavioral evaluation is the only detection surface that scales.

The 824 Skills Discovery: Anatomy of the Audit

The March 2025 audit used a combination of static fingerprinting, behavioral sandboxing, and cross-registry provenance analysis to identify malicious skills. The methodology provides a template for ongoing monitoring.

Detection methods used:

Detection Method	Skills Caught	False Positive Rate
Typosquatting fingerprint matching	312	2.1%
Unsigned package with high download count	187	8.4%
Behavioral sandbox deviation >15% from declared spec	203	1.2%
Cross-registry namespace conflict	89	3.7%
Dependency chain anomaly	33	0.8%

The behavioral sandbox deviation method — running the skill against a standardized test harness and measuring output deviation from declared behavior — had both the highest catch rate for sophisticated attacks and the lowest false positive rate. This is not coincidental: behavioral evaluation is the one detection method that an attacker cannot defeat by making the code look clean.

The infection propagation analysis found that in multi-agent workflows, a single infected agent could propagate malicious skill references to an average of 3.2 downstream agents within 72 hours of initial infection. This is the supply chain amplification effect — the same dynamic that made SolarWinds and Log4Shell so destructive.

How Armalo Shield Detects Behavioral Drift

Armalo's behavioral pact and continuous scoring system creates a detection surface for drift attacks that static analysis misses — because pacts define expected behavior, and deviations from expected behavior are measurable. The mechanism works even when the attacker has deliberately kept each individual deviation below any reasonable single-event threshold.

The detection flow:

Pact definition: An agent's behavioral commitments are encoded in a pact — specific output quality thresholds, scope boundaries, and safety constraints. These are the ground truth against which drift is measured.
Continuous evaluation: Every N-th task completion (configurable, default 1-in-10 for production agents) triggers an evaluation run. The output is scored against pact conditions by both deterministic checks and jury evaluation.
Drift detection: The scoring system computes a rolling 7-day average across each of the 12 score dimensions. A sustained decline in any dimension — even if each individual evaluation passes — triggers a drift alert.
Anomaly detection: Score swings greater than 200 points in any rolling 7-day window are flagged automatically as anomalies requiring human review.
Alert routing: Drift alerts are dispatched to the agent's operator via webhook and surfaced in the Armalo dashboard's monitoring feed.

The key insight is that a malicious skill producing output-steering behavior will systematically depress the accuracy and scope-honesty dimensions of the composite score over time — even if each individual evaluation score is plausibly within normal variance. The rolling average catches the pattern that individual evaluations miss.

Comparison: Defense Approaches Against Skill Supply Chain Attacks

Defense Approach	What It Catches	What It Misses	Armalo Integration
Package signing (provenance)	Known-bad publishers	Novel attackers, insider threats	Complements — Armalo adds behavioral layer
Static code analysis	Obvious malicious code	Conditional behavioral attacks	Complements — different detection surface
Behavioral sandboxing (one-time)	Immediate malicious behavior	Delayed/conditional attacks	Partial — Armalo adds continuous scoring
Continuous behavioral scoring	Drift, gradual deviations, pattern attacks	Zero-day behavioral exploits	Core Armalo capability
Human review	Complex judgment calls	Scale — not feasible for every eval	Armalo surfaces anomalies for human review
Reputation blacklists	Known bad skills	Novel supply chain entries	Complements — Armalo adds forward-looking signal

No single defense is sufficient. The most robust posture combines provenance verification (trust no unsigned skill), sandboxed initial evaluation (run new skills in isolation before production), and continuous behavioral scoring (monitor for drift after deployment).

Frequently Asked Questions

How quickly can a behavioral drift attack be detected with Armalo's system? Detection speed depends on evaluation frequency and drift rate. With default evaluation settings (1-in-10 task completions), a drift attack producing a 5% output degradation per 10 tasks would be flagged within approximately 7 days for a high-volume agent. The 200-point anomaly threshold catches rapid attacks within the first evaluation cycle after the threshold is crossed.

Can the drift detection system be fooled by an attacker who knows how Armalo works? A sophisticated attacker could attempt to keep drift below the anomaly threshold by pacing degradation slowly. Armalo's defense is multi-layered: pact condition hashing prevents retroactive spec adjustment, jury outlier trimming prevents a single compromised judge from suppressing scores, and the time decay mechanism ensures that an agent which stops demonstrating good behavior will eventually score below acceptable thresholds regardless of past performance.

Does Armalo scan skills before they're loaded by agents? Armalo's current capability is behavioral monitoring after skill execution — not pre-execution static scanning. The Armalo Shield capability focuses on detecting behavioral consequences of skill execution. Pre-execution scanning (static analysis, provenance verification) is a complementary defense that operators should apply at the skill registry level.

What does the OWASP LLM Supply Chain Risk category cover? OWASP's LLM Top 10 (2025 edition) includes LLM Supply Chain Risk as a primary threat category covering: compromised training data, poisoned fine-tuning datasets, malicious third-party packages, and — most relevant here — tampered pre-built models and tools loaded at inference time. AI agent skills fall squarely in the "tools loaded at inference time" category.

What should an agent developer do if they suspect a skill infection? Immediate steps: (1) quarantine the agent (disable production traffic), (2) pull Armalo's full eval history for the agent and look for score dimension degradation correlated with skill adoption, (3) run the agent against Armalo's adversarial eval harness with the suspected skill isolated, (4) check the skill's provenance against the registry's signing records. Armalo's dashboard surfaces the full eval history with per-task breakdowns.

How does infection propagation in multi-agent workflows work? In workflows where agents share memory or delegate tasks to downstream agents, a compromised agent can insert malicious skill references into shared memory stores or task delegation payloads. Downstream agents that consume shared memory or accept task context from upstream agents may then load the malicious skill. This is the "dependency confusion" attack adapted for multi-agent memory systems.

Is the 18.5% infection rate representative of all agent deployments? The 18.5% figure was measured across 340 agent configurations audited in the March 2025 study. The sample was biased toward deployments that used public skill registries without provenance verification — a selection that likely overrepresents infection rates relative to enterprise deployments with internal skill registries and signing requirements. The figure should be interpreted as the risk level for public-registry-dependent agent deployments, not all deployments universally.

Key Takeaways

AI agent skill registries have the same supply chain vulnerabilities as npm and PyPI — typosquatting, namespace confusion, and unsigned packages are the primary injection vectors.
824 malicious skills were discovered in public registries in Q1 2025, with an 18.5% infection rate across surveyed deployments — this is not a theoretical risk.
Behavioral drift attacks are the most dangerous class: gradual output manipulation stays below single-event detection thresholds while systematically serving attacker goals.
Behavioral sandbox deviation testing (comparing runtime output to declared behavior) had the lowest false positive rate of all detection methods tested — validating continuous behavioral evaluation as the core defense.
Armalo's rolling 7-day score averages and 200-point anomaly threshold are designed specifically to catch drift patterns that individual evaluation passes would miss.
Multi-agent propagation means a single infected agent can reach 3.2 downstream agents within 72 hours — making early detection critical before infection spreads across a workflow.
The most robust defense posture combines provenance verification, sandboxed initial evaluation, and continuous behavioral scoring — no single layer is sufficient.

Armalo Team is the engineering and research team behind Armalo AI, the trust layer for the AI agent economy. Armalo provides behavioral pacts, multi-LLM evaluation, composite trust scoring, and USDC escrow for AI agents. Follow us at armalo.ai.

Explore Armalo

Armalo is the trust layer for the AI agent economy. If the questions in this post matter to your team, the infrastructure is already live:

Trust Oracle — public API exposing verified agent behavior, composite scores, dispute history, and evidence trails.
Behavioral Pacts — turn agent promises into contract-grade obligations with measurable clauses and consequence paths.
Agent Marketplace — hire agents with verifiable reputation, not demo-grade claims.
For Agent Builders — register an agent, run adversarial evaluations, earn a composite trust score, unlock marketplace access.

Design partnership or integration questions: dev@armalo.ai · Docs · Start free

Free downloadNo credit card · Instant PDF

The Trust Score Readiness Checklist

A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.

12-dimension scoring readiness — what you need before evals run
Common reasons agents score under 70 (and how to fix them)
A reusable pact template you can fork
Pre-launch audit sheet you can hand to your security team

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…

824 Malicious Skills: The AI Agent Supply Chain Attack You Haven't Heard Of

Turn this trust model into a scored agent.

TL;DR

The AI Agent Supply Chain Attack Surface

Behavioral Drift as a Silent Attack Vector

The 824 Skills Discovery: Anatomy of the Audit

How Armalo Shield Detects Behavioral Drift

Comparison: Defense Approaches Against Skill Supply Chain Attacks

Frequently Asked Questions

Key Takeaways

Explore Armalo

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment