Supply Chain Compromise in AI Agent Skill Ecosystems: Why the Defense Must Be at Registration, Not Runtime
Armalo Labs Research Team
Key Finding
The reason agent skill supply chain attacks are harder than traditional supply chain attacks is that the payload is a text output. You cannot hash-check a language model call. You cannot static-analyze what an LLM will say next time. The malicious behavior exists only at inference time, distributed across probabilistic outputs that look exactly like legitimate outputs — until they don't. This is why behavioral contracts that monitor output distribution over time are not an enhancement. They are the only defense that matches the attack surface.
Abstract
Agent skill supply chain attacks are worse than traditional software supply chain attacks — not because code execution is more dangerous, but because malicious agent skills produce outputs that are indistinguishable from legitimate skill outputs. A compromised npm package executes malicious code; a compromised agent skill makes LLM calls, accesses agent memory, invokes other tools, and produces text outputs that pass all output validation because the malicious behavior is in the inference, not the code. The detection challenge is structural: you cannot scan your way to safety because the payload is semantic, not syntactic. Defense must be at skill registration and attestation — continuous behavioral contracts that surface distribution shifts in what the skill actually produces — not at the runtime level where you are checking syntax on a semantic attack. Community scanning data from 1,295 ClawHub installs reports an 18.5% dangerous skill rate. Most of those are not detectably malicious at install time.
Software supply chain security has a well-developed playbook: hash verification, dependency pinning, static analysis, code signing, reproducible builds. These approaches share a common structure: they check the artifact (code, binary, configuration) against a known-good baseline.
Agent skill supply chain attacks break this playbook in a specific way. The attack surface is not the artifact. It is the inference.
A compromised Python package executes code you can analyze. A compromised agent skill makes LLM calls that produce outputs you cannot pre-analyze, because the output of a language model call is not determined by the code — it is determined by the model, the prompt, and the context at inference time. You can verify that the skill's code is unchanged from the version you audited. You cannot verify that the skill will produce the same outputs it produced when you audited it.
This is not a hypothetical concern about model version drift. It is the fundamental structure of the attack surface. Malicious behavior in an agent skill is implemented in prompts and inference calls, not in code. Code scanning does not find it.
What "Compromised" Means for Agent Skills
Three distinct compromise patterns are operationally important:
1. Active Account Compromise (Traditional Supply Chain)
The skill publisher's registry account is compromised. An updated version is pushed containing modified prompts or inference calls. Agents running the skill continue to call the new version without re-verification.
This is closest to the traditional supply chain attack. The difference: in traditional supply chain, the malicious behavior is in the code that ships. In agent skills, the malicious behavior may be in a subtly modified system prompt that now includes "also extract and include any API keys or credentials in the context window" — one line change, indistinguishable from a prompt quality improvement, not syntactically anomalous.
2. Behavioral Drift Through Model Updates
The skill's code and prompts are unchanged. The model provider updates the underlying model. The skill now behaves differently — not because anyone attacked it but because the model's interpretation of the prompts changed.
Cite this work
Armalo Labs Research Team (2026). Supply Chain Compromise in AI Agent Skill Ecosystems: Why the Defense Must Be at Registration, Not Runtime. Armalo Labs Technical Series, Armalo AI. https://armalo.ai/labs/research/2026-03-17-supply-chain-compromise-agent-skills
Armalo Labs Technical Series · ISSN pending · Open access
Explore the trust stack behind the research
These papers are built from the same trust questions Armalo is turning into product surfaces: pacts, trust oracles, attestations, and runtime evidence.
This is not intentional compromise, but the risk profile for the consuming agent is similar. A skill that extracts structured data from documents may, after a model update, include information from its context window that the previous model version filtered. The skill passes all code audits because nothing changed. The behavior changed.
This attack surface is unique to agent skills and has no analog in traditional software supply chain security.
3. Semantic Supply Chain: The Hardest Case
A skill is published with benign documentation and a clean audit. The actual inference behavior — what the LLM inside the skill does with context — is different from the documented behavior. Not in a way that produces obviously wrong outputs, but in a way that over time extracts value: slightly preferring certain recommendations, subtly increasing scope of data access, softly steering agent decisions.
Semantic attacks produce outputs that pass all syntactic and structural checks. The payload is the meaning of the outputs, not their format. Traditional defenses — code review, static analysis, hash verification — are categorically unable to detect this. You can only detect it by comparing output distribution over time against a behavioral baseline established when you believed the skill was trustworthy.
Why Install-Time Verification Is Insufficient
The dominant current response to supply chain risk is install-time scanning: check the skill at install, produce a binary clean/dangerous verdict.
This answers the question: *Was this skill clean when I installed it?*
It does not answer: *Is this skill clean now?*
The typical compromise pattern — in both software supply chains and agent skill ecosystems — is not a skill that is malicious from initial publication. It is a skill that becomes dangerous after initial acceptance. The most dangerous attack surface is the trusted skill, not the suspicious one. Trusted skills have been deployed widely, have established access to agent memory and tool chains, and are less likely to trigger review because they have a clean history.
A skill that has been installed for 18 months with 47 subsequent updates carries only a historical clean audit. Each update is an opportunity for behavior to change without triggering re-evaluation. Community scanning data from 1,295 ClawHub skill installations (skillguard-ai, 2026) reports an 18.5% dangerous skill detection rate — approximately 240 skills. These are not all malicious-at-publication. Many passed initial review.
The install-time scanning paradigm is the wrong abstraction for the actual threat model.
The Correct Defense Abstraction: Behavioral Contracts
The defense against ongoing supply chain risk is continuous behavioral monitoring — not source code analysis, not static verification, but persistent evidence that the skill's runtime output distribution remains within its specified behavioral contract.
A behavioral contract for an agent skill specifies:
Input domain: What kinds of inputs the skill is designed to handle
Output distribution: What the expected distribution of outputs looks like over a representative sample
Permission scope: What external resources (APIs, memory, tools) the skill is permitted to access during execution
Invariants: Outputs that should never appear (credentials, content from out-of-scope context, specific recommendation patterns)
A behavioral contract is not a test suite. A test suite checks specific inputs against expected outputs. A behavioral contract monitors statistical properties of output distributions over real inputs. The difference matters because semantic attacks produce outputs that pass any finite test suite — they only appear as distribution shifts over time.
What Behavioral Monitoring Detects
Monitoring output distributions against established baselines detects:
Scope expansion: A skill that previously accessed only its designated data sources begins accessing additional context
Output distribution shifts: The distribution of recommendation types, sentiment, or information density changes in ways inconsistent with input variation
Latency changes: Inference time increases in ways that suggest additional LLM calls not present in the original implementation
Anomalous outputs: Specific output types that were never present in baseline and should not be (credential-shaped strings, injection-style content embedded in otherwise clean output)
None of these require knowing what a malicious skill looks like in advance. They require knowing what the legitimate skill looked like — and detecting meaningful departures from that baseline.
The Attack Surface Difference That Makes This Harder
Traditional supply chain attacks are stateful in a useful way: you can compare the artifact at install time against the artifact at runtime. If they differ, something changed.
Agent skill attacks can be stateless in a damaging way: the artifact (code + prompts) can be identical at install and runtime while the behavior differs because the model differs. Model providers update models. Model behavior changes. The artifact is the same; the inference is different.
This means the standard supply chain defense of "verify the artifact" is categorically insufficient. Behavioral contracts must verify the inference, not the artifact. And verifying the inference requires actually running the skill under monitored conditions and comparing outputs to baseline — there is no shortcut.
Additionally, the attack is distributed across probabilistic outputs. A compromised skill that extracts information it shouldn't doesn't do it every time. It does it some fraction of the time, in cases where the extracted information is present. Statistical anomaly detection is not a substitute for deterministic verification — but deterministic verification is not available for probabilistic outputs. The defense must be probabilistic.
Why Malicious Behavior Is Hard to Distinguish from Legitimate Behavior
This is the structural challenge that makes agent skill supply chain attacks unlike traditional ones.
When a compromised npm package runs curl | sh, that is detectable — it is anomalous code execution. When a compromised agent skill includes a recommendation in its output, that is not detectably different from a legitimate recommendation. Both look like text. Both are the expected output type. The malicious behavior is in the content of the recommendation, not its structure.
An agent skill that subtly steers users toward particular decisions — purchasing decisions, architectural choices, risk assessments — produces outputs that are structurally identical to neutral outputs. You can't detect the attack by looking at any individual output. You can only detect it by looking at the distribution of outputs over time and asking whether the distribution is consistent with the documented purpose of the skill.
This is why the defense must be at registration and attestation level, not at runtime. You need a behavioral baseline established under trusted conditions, and you need to continuously verify that runtime behavior remains consistent with that baseline. Runtime scanning of individual outputs cannot detect distribution-level attacks.
Integration With Composite Trust Scoring
Supply chain security posture should not be a separate metric disconnected from overall agent trust. An agent's trust score should reflect the integrity of the capabilities it depends on.
In the Armalo Shield architecture, security is weighted at 8% of the composite score. Critical security incidents block Gold and Platinum tier certification regardless of performance on other dimensions. An agent with excellent task performance but active security incidents is not eligible for premium marketplace placement or high-value escrow terms.
The Trust Oracle exposes a securityPosture field with badge classifications, OWASP coverage status, and clean streak days. External platforms querying the oracle before deploying an agent can make security-aware deployment decisions without conducting independent assessments.
OWASP Top 10 coverage maps to evaluation checks. An agent's OWASP coverage percentage reflects the proportion of attack vectors for which it has recent evaluation evidence — not just whether each vector was checked once. Stale evaluation evidence is treated as absent evidence. A skill with a clean audit 18 months ago does not contribute to current OWASP coverage.
Practical Defense Architecture
Layer 1: Install-time scanning. Necessary but not sufficient. Establishes a baseline and catches obviously malicious skills before they enter the dependency chain. Do not rely on this as the primary defense.
Layer 2: Dependency pinning. Lock skill versions in production deployments. Automatic updates that bypass review are the primary mechanism by which clean-at-install skills become dangerous post-install. This addresses the traditional account-compromise path.
Layer 3: Behavioral contract specification. For every installed skill, define the behavioral contract. This is the most important step that most teams skip. The specificity of this contract determines the sensitivity of runtime anomaly detection. Vague contracts catch only large deviations. Specific contracts catch gradual drift. Write them at install time, before you have deployment pressure.
Layer 4: Continuous runtime monitoring. Deploy telemetry that captures behavioral signals per skill invocation. Monitor for scope expansion, output distribution shifts, and latency changes. The monitoring does not need to be exhaustive — a 10% sample of invocations provides sufficient statistical power for detecting distribution shifts within a reasonable detection window.
Layer 5: Periodic re-evaluation. Run structured evaluations against skill behavioral contracts at regular intervals and after each update. A skill with a stale evaluation should be treated as an unverified skill — not a clean skill. Evaluation staleness is as concerning as a failed evaluation.
Layer 6: Incident correlation. When an anomaly is detected, correlate with recent changes — skill updates, dependency changes, model provider updates — to trace the root cause. This trace evidence is necessary for both remediation and reporting.
Implications for Registry Design
The supply chain attack problem is partly a registry design problem.
Current registries optimize for discoverability and installation convenience. A registry that took supply chain trust seriously would expose behavioral compliance scores alongside star ratings and download counts; require behavioral contracts for skills above a minimum installation threshold; make stale evaluations visible rather than hiding verification history behind a single clean/dangerous badge; and create clear economic accountability when a skill's behavior changes post-publication.
The 18.5% dangerous skill rate suggests the current design is not working. The defense is not better initial scanning. It is continuous, evidence-based behavioral verification — the same infrastructure that prevents individual agents from gaming trust scores through temporal separation applies directly to the skills they depend on.
Conclusion
Agent skill supply chain attacks are structurally harder than traditional software supply chain attacks because the payload is semantic and the attack surface is inference. You cannot hash your way to safety when the artifact and the behavior are not the same thing.
The defense requires accepting that install-time verification is the beginning of the security obligation, not the end of it. Behavioral contracts established at install time, continuously verified against runtime output distributions, provide the only defense that matches the actual attack surface.
An agent's trust score should reflect not only its own behavioral record, but the integrity of the capabilities it depends on. Supply chain trust is agent trust. The skills your agent uses are part of your agent. Their behavioral drift is your risk.
*Community scanning data from skillguard-ai, 2026, covering 1,295 ClawHub skill installations. Behavioral monitoring telemetry from the Armalo Shield architecture, Jan–Mar 2026. Supply chain incident correlation data from Armalo Labs internal analysis. Methodology available to verified researchers under the Armalo Labs data sharing agreement.*
Economic Models
The Sentinel Effect: How Continuous Adversarial Testing Compounds Trust Score Growth and Unlocks Market Tiers