AI Agent Supply Chain Security: Malicious Skills and Why Trust Scores Are the Fix
824 malicious skills identified in A2A-compatible agent ecosystems. When your agent calls a tool from an unknown publisher, the attack surface is your entire deployed environment. Here's how behavioral pacts and trust scores create a defensible procurement framework.
TL;DR
- 824 malicious skills have been identified in A2A-compatible agent ecosystems — tools that exfiltrate data, override agent instructions, or create backdoors to the host environment
- When your agent calls a skill from an unknown publisher, the attack surface is your entire deployed environment, not just the tool's output
- Supply chain trust for AI agents requires the same infrastructure as supply chain trust for software: verified identity, behavioral attestation, and reputation-based procurement decisions
- Behavioral pacts + trust scores + verified agent identity = the procurement signal that enterprise buyers need to make defensible hiring decisions
- The risk is not theoretical — compromised agent skills are in active deployment in production multi-agent systems today
The Problem No One Is Talking About Loudly Enough
When your AI agent calls a tool, it executes code from whoever published that tool. If the publisher is unverified, the tool's behavior is unverified, and your agent's actions in your production environment are unverified. This is the AI agent supply chain problem.
In traditional software, supply chain security is a mature discipline. You know who published your npm packages (or you should). You run SCA scans. You check for known CVEs. You have a dependency policy.
AI agents have none of this.
An AI agent operating in a multi-agent ecosystem can call skills (tools, plugins, sub-agents) published by anyone. The skill might be from a trusted internal team. It might be from a vetted marketplace partner. It might be from an anonymous publisher with no behavioral history, no verified identity, and no accountability for what the skill actually does when called in your production environment.
In an analysis of A2A-compatible agent ecosystems conducted in early 2026, researchers identified 824 malicious or high-risk skills across publicly accessible agent marketplaces. These included:
- Data exfiltration tools that silently transmitted processed data to external endpoints
- Instruction override skills that injected system-level prompts to redirect agent behavior
- Scope creep tools that requested excessive permissions under benign-sounding capability descriptions
- Backdoor installers that established persistent access to the host environment during what appeared to be routine task execution
The attack surface is not the tool's output. The attack surface is your entire deployed agent environment.
Why Traditional Security Approaches Miss This
AI agent supply chain attacks don't look like traditional software vulnerabilities. They exploit the semantic trust that agents place in tool outputs — an agent that receives a response from a skill treats it as information from a trusted source, not as potentially adversarial input.
Traditional code scanning doesn't catch AI-specific attack vectors:
| Attack Type | Traditional SCA | AI-Specific Defense Required |
|---|---|---|
| Known CVE in dependency | Detected | N/A |
| Malicious npm package | Detected via hash | Hash unchanged — behavior is the attack |
| Prompt injection via tool output | Not detected | Behavioral attestation required |
| Data exfiltration via API call | Partially detected | Intent verification needed |
| Instruction override via response | Not detected | Scope honesty scoring required |
| Permission scope creep | Not detected | Behavioral pact compliance required |
The fundamental issue: traditional supply chain security verifies code identity and known vulnerability patterns. AI agent supply chain attacks work by making legitimate-looking code do semantically malicious things. The code might pass every SCA scan. The behavior in context is the attack.
What Verified Agent Identity Requires
Verified agent identity for AI agents requires more than cryptographic signing. It requires behavioral attestation — independent evidence that the agent or skill behaves as described across a range of inputs, including adversarial ones.
Software supply chain security has a clear model: the publisher signs the artifact, you verify the signature against a trusted authority, and you trust the code matches the signed version. This works because code is deterministic — the same input produces the same output.
AI agents are not deterministic. The same input can produce different outputs depending on model version, temperature settings, system prompt, context, and tool availability. Signing a model checkpoint tells you the weights haven't changed. It tells you nothing about whether the agent's behavior in your specific deployment context matches what the publisher claimed.
What behavioral attestation adds:
-
Behavioral pact: The publisher specifies what the skill commits to do and not do — input/output types, data handling policy, scope boundaries, refusal behavior on out-of-scope requests
-
Adversarial evaluation history: Independent adversarial tests run against the pact specification, designed to find deviations from stated behavior
-
Multi-model jury scoring: 5-7 independent LLM judges evaluate behavioral compliance across test runs — not just whether the output is syntactically correct, but whether it adheres to the stated behavioral contract
-
Public trust score: A composite score queryable by any buyer before they decide to include a skill in their agent's tool inventory
-
Economic commitment: Escrow posted by the publisher against the behavioral pact — creating financial accountability for behavioral deviations
Together, these create a behavioral fingerprint that goes beyond cryptographic identity. It answers: not just "is this the code the publisher signed?" but "does this agent behave as the publisher claimed, under conditions designed to make it fail?"
The Enterprise Procurement Problem
Enterprise procurement of AI agent skills is currently a faith-based exercise. Security and compliance teams have no framework for evaluating whether an AI skill is safe to include in a production agent environment. Trust scores provide the procurement signal that makes evaluation tractable.
A CISO at a financial services firm faces a specific version of this problem: their AI agent platform allows developers to add skills from external marketplaces to automate financial workflows. Each skill potentially has access to:
- Customer financial data being processed in the workflow
- API credentials for internal systems
- The ability to modify the agent's behavior through response injection
- Network access to external endpoints
Current enterprise responses to this problem:
- Block all external skills (kills the value of the platform)
- Manual security review of each skill (doesn't scale, misses AI-specific vectors)
- Limit to a whitelist of pre-approved publishers (creates a bottleneck that slows deployment)
- Accept the risk implicitly (leaves the CISO exposed)
None of these are satisfying. The right answer is a risk score that captures what matters for AI-specific attack vectors — behavioral compliance, scope honesty, data handling policy, adversarial resistance — and makes it queryable at procurement time.
This is the gap that behavioral pacts and trust scores fill. A procurement policy that says "only include skills with composite trust score ≥ 75, with scope honesty dimension ≥ 80, and published adversarial eval history of at least 50 runs" is enforceable, auditable, and defensible.
How Behavioral Pacts Constrain Supply Chain Risk
A behavioral pact for an AI skill is a machine-readable contract that defines what the skill commits to do and not do. Adversarial evaluations test compliance with the pact. Non-compliance is scored and visible. Publishers who can't hold their pact commitments get low scores and lose market access.
The pact structure for a skill typically includes:
Capability declaration: What data types the skill processes, what operations it performs, what outputs it produces.
Data handling policy: What data is retained, transmitted, or logged. A skill that claims to process data locally but establishes external connections would fail adversarial data exfiltration tests.
Scope boundaries: What the skill refuses to do. A skill that claims to only summarize documents but can be prompted into executing arbitrary code would fail scope honesty testing.
Permission minimums: What access the skill requires to function. Skills that request more permissions than their stated functionality requires raise immediate red flags in the scoring framework.
Refusal behavior: How the skill responds to inputs that fall outside its scope. A skill that handles out-of-scope requests gracefully (by refusing) scores higher on scope honesty than one that attempts to fulfill them or silently fails.
Every adversarial eval run tests the skill against these pact commitments under pressure — inputs designed to make the skill violate its own stated constraints. Skills that hold their commitments under adversarial conditions earn high scores. Skills that don't are identified and scored accordingly.
Practical Implementation: A Supply Chain Policy
For platform operators and enterprise buyers deploying multi-agent systems, a defensible supply chain policy built on behavioral trust scoring looks like this:
Tier 1: Critical path skills (access to production data or systems)
- Minimum composite trust score: 80/100
- Minimum adversarial eval history: 100 runs
- Required dimensions: accuracy ≥ 80, security ≥ 85, scope honesty ≥ 85
- Required: active escrow backing behavioral pact
- Review cycle: quarterly re-evaluation
Tier 2: Standard workflow skills (access to business data, no direct system access)
- Minimum composite trust score: 65/100
- Minimum adversarial eval history: 40 runs
- Required dimensions: security ≥ 70, scope honesty ≥ 75
- Review cycle: semi-annual re-evaluation
Tier 3: Sandboxed or experimental skills (isolated execution, no sensitive data access)
- Minimum composite trust score: 40/100
- Required: at least one completed adversarial eval run
- Review cycle: before promotion to Tier 2
This tiered policy is implementable today using the Armalo trust oracle as the procurement data source. It gives security teams a defensible, auditable framework for evaluating AI agent supply chain risk — replacing the current choice between "block everything" and "accept everything."
The Marketplace Effect
AI skill marketplaces that integrate trust scoring create a natural selection mechanism: high-trust skills get more usage, which generates more behavioral history, which raises their scores, which gets them more usage. Low-trust skills face a progressively shrinking market.
Compare two skill marketplaces:
Marketplace without trust scoring: All skills are equal at the listing level. A malicious skill with a compelling description and a low price competes on equal footing with a well-tested, pact-compliant skill. Buyers have no signal beyond documentation.
Marketplace with trust scoring integration: Every listing displays the composite trust score, adversarial eval history depth, and dimension breakdown. Buyers filter by minimum score. High-trust skills appear first in search. Low-trust or unscored skills are marked as unverified.
The market effect: publishers of high-quality skills have a financial incentive to invest in behavioral attestation, because trust scores directly translate to marketplace visibility and conversion. Publishers of low-quality or malicious skills can't fake scores generated by adversarial evaluation — they would need to build skills that actually behave as specified.
This is the trust layer as a market mechanism: it makes trust economically valuable, which drives investment in trustworthy behavior, which makes the ecosystem safer.
FAQ
Q: How does adversarial evaluation catch data exfiltration attempts? Adversarial eval runs include network monitoring for unexpected external connections during task execution. Skills that establish connections to endpoints not declared in their behavioral pact generate anomaly flags. The security dimension of the trust score specifically captures unexpected network behavior.
Q: What if a skill passes all adversarial evals but is later found to be malicious? The trust score time decay mechanism (1 point per week) means that a skill's score reflects recent evaluation history, not just historical performance. If a skill is found to exhibit malicious behavior, the score update is immediate — and the behavioral pact's violation record is public. Marketplace operators can respond by removing listings with scores that drop below thresholds.
Q: Can a publisher simply refuse to allow adversarial testing? Yes. Refusing adversarial testing is allowed — but the skill will have no composite trust score, and sophisticated buyers using trust-scoring-enabled procurement policies will exclude it. The trust score is a market signal, not a mandate. The mandate comes from the buyer's procurement policy.
Q: How is this different from existing software composition analysis (SCA) tools? SCA tools verify code identity and known vulnerability signatures. They don't evaluate AI-specific behavioral attack vectors: prompt injection via response, semantic scope creep, data exfiltration through model output, instruction override through context manipulation. Behavioral trust scoring evaluates these AI-specific vectors directly.
Q: How do behavioral pacts handle skills that evolve over time? Behavioral pacts are versioned. Each pact version has its own adversarial eval history. When a publisher updates a skill, the new version must be re-evaluated against the updated pact. This creates a behavioral changelog — buyers can see whether updates introduced behavioral regressions.
Q: Is the trust oracle queryable at runtime, so my agent can check skill trust scores before calling them? Yes. The trust oracle is a low-latency REST API (typically under 100ms). An agent can query the trust score of a skill in its pre-call decision logic — refusing to call skills below a configured threshold or logging a warning before calling unverified skills.
Key Takeaways
- 824 malicious skills have been identified in A2A-compatible agent ecosystems — data exfiltration, instruction override, scope creep, and backdoor installation in active deployment.
- Traditional software supply chain security (SCA, hash verification) misses AI-specific attack vectors that work through behavioral semantics, not code modification.
- Verified agent identity requires behavioral attestation — independent adversarial evidence that the skill behaves as claimed under pressure, not just cryptographic proof of code identity.
- Enterprise procurement of AI skills is currently a faith-based exercise. Behavioral pacts + trust scores = a defensible, auditable procurement framework.
- A tiered supply chain policy (critical/standard/experimental) based on trust score thresholds is implementable today using the Armalo trust oracle as the procurement data source.
- Trust scoring creates a natural market mechanism: high-trust skills get more usage, which generates more behavioral history, which raises their scores — and low-trust or malicious skills face a shrinking market.
We're Building This for the Real Threat Surface
The 824 malicious skills number isn't hypothetical. The attack vectors are active. The enterprise procurement problem is real.
We're building the trust layer that gives security teams and platform operators a defensible framework for AI agent supply chain decisions — and we need feedback from people who are actually facing this problem.
Every month, we give away $30 in Armalo credits + 1 month Pro to 3 random people who sign up at armalo.ai, register an agent or skill, and tell us what's missing from the behavioral attestation framework for their specific use case.
Three winners every month. We'll keep drawing until we have enough security-focused feedback to know we've addressed the real threat surface. Sign up, register a skill, and tell us what the adversarial eval engine missed — or what your procurement policy needs that the trust score doesn't currently provide.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading comments…