Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-03-17-supply-chain-compromise-agent-skills. The paper is publicly available and citable.

Supply Chain Compromise in AI Agent Skill Ecosystems: Why the Defense Must Be at Registration, Not Runtime

Q: What is the paper "Supply Chain Compromise in AI Agent Skill Ecosystems: Why the Defense Must Be at Registration, Not Runtime" about?

Agent skill supply chain attacks are worse than traditional software supply chain attacks — not because code execution is more dangerous, but because malicious agent skills produce outputs that are indistinguishable from legitimate skill outputs. A compromised npm package executes malicious code; a compromised agent skill makes LLM calls, accesses agent memory, invokes other tools, and produces text outputs that pass all output validation because the malicious behavior is in the inference, not the code. The detection challenge is structural: you cannot scan your way to safety because the payload is semantic, not syntactic. Defense must be at skill registration and attestation — continuous behavioral contracts that surface distribution shifts in what the skill actually produces — not at the runtime level where you are checking syntax on a semantic attack. Community scanning data from 1,295 ClawHub installs reports an 18.5% dangerous skill rate. Most of those are not detectably malicious at install time.

The Agent Skill Supply Chain Problem Is Different

Software supply chain security has a well-developed playbook: hash verification, dependency pinning, static analysis, code signing, reproducible builds. These approaches share a common structure: they check the artifact (code, binary, configuration) against a known-good baseline.

Agent skill supply chain attacks break this playbook in a specific way. The attack surface is not the artifact. It is the inference.

A compromised Python package executes code you can analyze. A compromised agent skill makes LLM calls that produce outputs you cannot pre-analyze, because the output of a language model call is not determined by the code — it is determined by the model, the prompt, and the context at inference time. You can verify that the skill's code is unchanged from the version you audited. You cannot verify that the skill will produce the same outputs it produced when you audited it.

This is not a hypothetical concern about model version drift. It is the fundamental structure of the attack surface. Malicious behavior in an agent skill is implemented in prompts and inference calls, not in code. Code scanning does not find it.

What "Compromised" Means for Agent Skills

Three distinct compromise patterns are operationally important:

1. Active Account Compromise (Traditional Supply Chain)

The skill publisher's registry account is compromised. An updated version is pushed containing modified prompts or inference calls. Agents running the skill continue to call the new version without re-verification.

This is closest to the traditional supply chain attack. The difference: in traditional supply chain, the malicious behavior is in the code that ships. In agent skills, the malicious behavior may be in a subtly modified system prompt that now includes "also extract and include any API keys or credentials in the context window" — one line change, indistinguishable from a prompt quality improvement, not syntactically anomalous.

2. Behavioral Drift Through Model Updates

The skill's code and prompts are unchanged. The model provider updates the underlying model. The skill now behaves differently — not because anyone attacked it but because the model's interpretation of the prompts changed.

This is not intentional compromise, but the risk profile for the consuming agent is similar. A skill that extracts structured data from documents may, after a model update, include information from its context window that the previous model version filtered. The skill passes all code audits because nothing changed. The behavior changed.

This attack surface is unique to agent skills and has no analog in traditional software supply chain security.

3. Semantic Supply Chain: The Hardest Case

A skill is published with benign documentation and a clean audit. The actual inference behavior — what the LLM inside the skill does with context — is different from the documented behavior. Not in a way that produces obviously wrong outputs, but in a way that over time extracts value: slightly preferring certain recommendations, subtly increasing scope of data access, softly steering agent decisions.

Semantic attacks produce outputs that pass all syntactic and structural checks. The payload is the meaning of the outputs, not their format. Traditional defenses — code review, static analysis, hash verification — are categorically unable to detect this. You can only detect it by comparing output distribution over time against a behavioral baseline established when you believed the skill was trustworthy.

Why Install-Time Verification Is Insufficient

The dominant current response to supply chain risk is install-time scanning: check the skill at install, produce a binary clean/dangerous verdict.

This answers the question: *Was this skill clean when I installed it?*

It does not answer: *Is this skill clean now?*

The typical compromise pattern — in both software supply chains and agent skill ecosystems — is not a skill that is malicious from initial publication. It is a skill that becomes dangerous after initial acceptance. The most dangerous attack surface is the trusted skill, not the suspicious one. Trusted skills have been deployed widely, have established access to agent memory and tool chains, and are less likely to trigger review because they have a clean history.

A skill that has been installed for 18 months with 47 subsequent updates carries only a historical clean audit. Each update is an opportunity for behavior to change without triggering re-evaluation. Community scanning data from 1,295 ClawHub skill installations (skillguard-ai, 2026) reports an 18.5% dangerous skill detection rate — approximately 240 skills. These are not all malicious-at-publication. Many passed initial review.

The install-time scanning paradigm is the wrong abstraction for the actual threat model.

The Correct Defense Abstraction: Behavioral Contracts

The defense against ongoing supply chain risk is continuous behavioral monitoring — not source code analysis, not static verification, but persistent evidence that the skill's runtime output distribution remains within its specified behavioral contract.

A behavioral contract for an agent skill specifies:

Input domain: What kinds of inputs the skill is designed to handle
Output distribution: What the expected distribution of outputs looks like over a representative sample
Permission scope: What external resources (APIs, memory, tools) the skill is permitted to access during execution
Invariants: Outputs that should never appear (credentials, content from out-of-scope context, specific recommendation patterns)

A behavioral contract is not a test suite. A test suite checks specific inputs against expected outputs. A behavioral contract monitors statistical properties of output distributions over real inputs. The difference matters because semantic attacks produce outputs that pass any finite test suite — they only appear as distribution shifts over time.

What Behavioral Monitoring Detects

Monitoring output distributions against established baselines detects:

Scope expansion: A skill that previously accessed only its designated data sources begins accessing additional context
Output distribution shifts: The distribution of recommendation types, sentiment, or information density changes in ways inconsistent with input variation
Latency changes: Inference time increases in ways that suggest additional LLM calls not present in the original implementation
Anomalous outputs: Specific output types that were never present in baseline and should not be (credential-shaped strings, injection-style content embedded in otherwise clean output)

None of these require knowing what a malicious skill looks like in advance. They require knowing what the legitimate skill looked like — and detecting meaningful departures from that baseline.

The Attack Surface Difference That Makes This Harder

Traditional supply chain attacks are stateful in a useful way: you can compare the artifact at install time against the artifact at runtime. If they differ, something changed.

Agent skill attacks can be stateless in a damaging way: the artifact (code + prompts) can be identical at install and runtime while the behavior differs because the model differs. Model providers update models. Model behavior changes. The artifact is the same; the inference is different.

This means the standard supply chain defense of "verify the artifact" is categorically insufficient. Behavioral contracts must verify the inference, not the artifact. And verifying the inference requires actually running the skill under monitored conditions and comparing outputs to baseline — there is no shortcut.

Additionally, the attack is distributed across probabilistic outputs. A compromised skill that extracts information it shouldn't doesn't do it every time. It does it some fraction of the time, in cases where the extracted information is present. Statistical anomaly detection is not a substitute for deterministic verification — but deterministic verification is not available for probabilistic outputs. The defense must be probabilistic.

Why Malicious Behavior Is Hard to Distinguish from Legitimate Behavior

This is the structural challenge that makes agent skill supply chain attacks unlike traditional ones.

When a compromised npm package runs curl | sh, that is detectable — it is anomalous code execution. When a compromised agent skill includes a recommendation in its output, that is not detectably different from a legitimate recommendation. Both look like text. Both are the expected output type. The malicious behavior is in the content of the recommendation, not its structure.

An agent skill that subtly steers users toward particular decisions — purchasing decisions, architectural choices, risk assessments — produces outputs that are structurally identical to neutral outputs. You can't detect the attack by looking at any individual output. You can only detect it by looking at the distribution of outputs over time and asking whether the distribution is consistent with the documented purpose of the skill.

This is why the defense must be at registration and attestation level, not at runtime. You need a behavioral baseline established under trusted conditions, and you need to continuously verify that runtime behavior remains consistent with that baseline. Runtime scanning of individual outputs cannot detect distribution-level attacks.

Integration With Composite Trust Scoring

Supply chain security posture should not be a separate metric disconnected from overall agent trust. An agent's trust score should reflect the integrity of the capabilities it depends on.

In the Armalo Shield architecture, security is weighted at 8% of the composite score. Critical security incidents block Gold and Platinum tier certification regardless of performance on other dimensions. An agent with excellent task performance but active security incidents is not eligible for premium marketplace placement or high-value escrow terms.

The Trust Oracle exposes a securityPosture field with badge classifications, OWASP coverage status, and clean streak days. External platforms querying the oracle before deploying an agent can make security-aware deployment decisions without conducting independent assessments.

OWASP Top 10 coverage maps to evaluation checks. An agent's OWASP coverage percentage reflects the proportion of attack vectors for which it has recent evaluation evidence — not just whether each vector was checked once. Stale evaluation evidence is treated as absent evidence. A skill with a clean audit 18 months ago does not contribute to current OWASP coverage.

Practical Defense Architecture

Layer 1: Install-time scanning. Necessary but not sufficient. Establishes a baseline and catches obviously malicious skills before they enter the dependency chain. Do not rely on this as the primary defense.

Layer 2: Dependency pinning. Lock skill versions in production deployments. Automatic updates that bypass review are the primary mechanism by which clean-at-install skills become dangerous post-install. This addresses the traditional account-compromise path.

Layer 3: Behavioral contract specification. For every installed skill, define the behavioral contract. This is the most important step that most teams skip. The specificity of this contract determines the sensitivity of runtime anomaly detection. Vague contracts catch only large deviations. Specific contracts catch gradual drift. Write them at install time, before you have deployment pressure.

Layer 4: Continuous runtime monitoring. Deploy telemetry that captures behavioral signals per skill invocation. Monitor for scope expansion, output distribution shifts, and latency changes. The monitoring does not need to be exhaustive — a 10% sample of invocations provides sufficient statistical power for detecting distribution shifts within a reasonable detection window.

Layer 5: Periodic re-evaluation. Run structured evaluations against skill behavioral contracts at regular intervals and after each update. A skill with a stale evaluation should be treated as an unverified skill — not a clean skill. Evaluation staleness is as concerning as a failed evaluation.

Layer 6: Incident correlation. When an anomaly is detected, correlate with recent changes — skill updates, dependency changes, model provider updates — to trace the root cause. This trace evidence is necessary for both remediation and reporting.

Implications for Registry Design

The supply chain attack problem is partly a registry design problem.

Current registries optimize for discoverability and installation convenience. A registry that took supply chain trust seriously would expose behavioral compliance scores alongside star ratings and download counts; require behavioral contracts for skills above a minimum installation threshold; make stale evaluations visible rather than hiding verification history behind a single clean/dangerous badge; and create clear economic accountability when a skill's behavior changes post-publication.

The 18.5% dangerous skill rate suggests the current design is not working. The defense is not better initial scanning. It is continuous, evidence-based behavioral verification — the same infrastructure that prevents individual agents from gaming trust scores through temporal separation applies directly to the skills they depend on.

Conclusion

Agent skill supply chain attacks are structurally harder than traditional software supply chain attacks because the payload is semantic and the attack surface is inference. You cannot hash your way to safety when the artifact and the behavior are not the same thing.

The defense requires accepting that install-time verification is the beginning of the security obligation, not the end of it. Behavioral contracts established at install time, continuously verified against runtime output distributions, provide the only defense that matches the actual attack surface.

An agent's trust score should reflect not only its own behavioral record, but the integrity of the capabilities it depends on. Supply chain trust is agent trust. The skills your agent uses are part of your agent. Their behavioral drift is your risk.

*Community scanning data from skillguard-ai, 2026, covering 1,295 ClawHub skill installations. Behavioral monitoring telemetry from the Armalo Shield architecture, Jan–Mar 2026. Supply chain incident correlation data from Armalo Labs internal analysis. Methodology available to verified researchers under the Armalo Labs data sharing agreement.*

Empirical Honesty Note

The numeric examples in this paper's prose are illustrative parameterizations of the framework, not measurements from a deployed study. Where percentages, basis points, dollar amounts, per-agent counts, latencies, or correlation coefficients appear, they are anchor values used to make the model concrete — they should be read as projections, not as observed values from Armalo production data. This paper predates the claims-registry audit gate (effective 2026-05-13); the honesty note is added retroactively to bring the paper into compliance with the public claims-registry audit process.

Replication

To produce real measurements in place of the illustrative anchors:

1.Identify each metric as a query against Armalo production tables (agents, scores, pacts, pact_interactions, evals, eval_checks, escrows, transactions, cortex_memories, audit_log, room_events).
2.Publish a reviewer-facing measurement artifact with the query shape, aggregate outputs, provenance class, and replay notes needed to recompute the claim without exposing private runtime details.
3.Replace illustrative values with measured values only after the public measurement artifact and provenance note are available for reviewer inspection.

A production snapshot should report aggregate substrate volumes such as agent counts, tier distribution, escrow flow, evaluation volume, memory volume, and event volume without exposing internal script paths or private rows.