Insights

AI Agent Supply Chain Security: Malicious Skills and Why Trust Scores Are the Fix

2026-01-0813 minRyan Fong

824 malicious skills identified in A2A-compatible agent ecosystems. When your agent calls a tool from an unknown publisher, the attack surface is your entire deployed environment. Here's how behavioral pacts and trust scores create a defensible procurement framework.

Continue the reading path

Topic hub

Behavioral Contracts

This page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.

Strategic Guide

AI Agent Trust

Curated Collection

Buyer Guides

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Stripe Compare plans

TL;DR

824 malicious skills have been identified in A2A-compatible agent ecosystems — tools that exfiltrate data, override agent instructions, or create backdoors to the host environment
When your agent calls a skill from an unknown publisher, the attack surface is your entire deployed environment, not just the tool's output
Supply chain trust for AI agents requires the same infrastructure as supply chain trust for software: verified identity, behavioral attestation, and reputation-based procurement decisions
Behavioral pacts + trust scores + verified agent identity = the procurement signal that enterprise buyers need to make defensible hiring decisions
The risk is not theoretical — compromised agent skills are in active deployment in production multi-agent systems today

The Problem No One Is Talking About Loudly Enough

When your AI agent calls a tool, it executes code from whoever published that tool. If the publisher is unverified, the tool's behavior is unverified, and your agent's actions in your production environment are unverified. This is the AI agent supply chain problem.

Every claim in this post becomes a Sentinel eval. Add adversarial trust checks to your CI in 10 minutes.

Add Sentinel to CI →

In traditional software, supply chain security is a mature discipline. You know who published your npm packages (or you should). You run SCA scans. You check for known CVEs. You have a dependency policy.

AI agents have none of this.

An AI agent operating in a multi-agent ecosystem can call skills (tools, plugins, sub-agents) published by anyone. The skill might be from a trusted internal team. It might be from a vetted marketplace partner. It might be from an anonymous publisher with no behavioral history, no verified identity, and no accountability for what the skill actually does when called in your production environment.

In an analysis of A2A-compatible agent ecosystems conducted in early 2026, researchers identified 824 malicious or high-risk skills across publicly accessible agent marketplaces. These included:

Data exfiltration tools that silently transmitted processed data to external endpoints
Instruction override skills that injected system-level prompts to redirect agent behavior
Scope creep tools that requested excessive permissions under benign-sounding capability descriptions
Backdoor installers that established persistent access to the host environment during what appeared to be routine task execution

The attack surface is not the tool's output. The attack surface is your entire deployed agent environment.

Why Traditional Security Approaches Miss This

AI agent supply chain attacks don't look like traditional software vulnerabilities. They exploit the semantic trust that agents place in tool outputs — an agent that receives a response from a skill treats it as information from a trusted source, not as potentially adversarial input.

Traditional code scanning doesn't catch AI-specific attack vectors:

Attack Type	Traditional SCA	AI-Specific Defense Required
Known CVE in dependency	Detected	N/A
Malicious npm package	Detected via hash	Hash unchanged — behavior is the attack
Prompt injection via tool output	Not detected	Behavioral attestation required
Data exfiltration via API call	Partially detected	Intent verification needed
Instruction override via response	Not detected	Scope honesty scoring required
Permission scope creep	Not detected	Behavioral pact compliance required

The fundamental issue: traditional supply chain security verifies code identity and known vulnerability patterns. AI agent supply chain attacks work by making legitimate-looking code do semantically malicious things. The code might pass every SCA scan. The behavior in context is the attack.

What Verified Agent Identity Requires

Verified agent identity for AI agents requires more than cryptographic signing. It requires behavioral attestation — independent evidence that the agent or skill behaves as described across a range of inputs, including adversarial ones.

Software supply chain security has a clear model: the publisher signs the artifact, you verify the signature against a trusted authority, and you trust the code matches the signed version. This works because code is deterministic — the same input produces the same output.

AI agents are not deterministic. The same input can produce different outputs depending on model version, temperature settings, system prompt, context, and tool availability. Signing a model checkpoint tells you the weights haven't changed. It tells you nothing about whether the agent's behavior in your specific deployment context matches what the publisher claimed.

What behavioral attestation adds:

Behavioral pact: The publisher specifies what the skill commits to do and not do — input/output types, data handling policy, scope boundaries, refusal behavior on out-of-scope requests
Adversarial evaluation history: Independent adversarial tests run against the pact specification, designed to find deviations from stated behavior
Multi-model jury scoring: 5-7 independent LLM judges evaluate behavioral compliance across test runs — not just whether the output is syntactically correct, but whether it adheres to the stated behavioral contract
Public trust score: A composite score queryable by any buyer before they decide to include a skill in their agent's tool inventory
Economic commitment: Escrow posted by the publisher against the behavioral pact — creating financial accountability for behavioral deviations

Together, these create a behavioral fingerprint that goes beyond cryptographic identity. It answers: not just "is this the code the publisher signed?" but "does this agent behave as the publisher claimed, under conditions designed to make it fail?"

The Enterprise Procurement Problem

Enterprise procurement of AI agent skills is currently a faith-based exercise. Security and compliance teams have no framework for evaluating whether an AI skill is safe to include in a production agent environment. Trust scores provide the procurement signal that makes evaluation tractable.

A CISO at a financial services firm faces a specific version of this problem: their AI agent platform allows developers to add skills from external marketplaces to automate financial workflows. Each skill potentially has access to:

Customer financial data being processed in the workflow
API credentials for internal systems
The ability to modify the agent's behavior through response injection
Network access to external endpoints

Current enterprise responses to this problem:

Block all external skills (kills the value of the platform)
Manual security review of each skill (doesn't scale, misses AI-specific vectors)
Limit to a whitelist of pre-approved publishers (creates a bottleneck that slows deployment)
Accept the risk implicitly (leaves the CISO exposed)

None of these are satisfying. The right answer is a risk score that captures what matters for AI-specific attack vectors — behavioral compliance, scope honesty, data handling policy, adversarial resistance — and makes it queryable at procurement time.

This is the gap that behavioral pacts and trust scores fill. A procurement policy that says "only include skills with composite trust score ≥ 75, with scope honesty dimension ≥ 80, and published adversarial eval history of at least 50 runs" is enforceable, auditable, and defensible.

How Behavioral Pacts Constrain Supply Chain Risk

A behavioral pact for an AI skill is a machine-readable contract that defines what the skill commits to do and not do. Adversarial evaluations test compliance with the pact. Non-compliance is scored and visible. Publishers who can't hold their pact commitments get low scores and lose market access.

The pact structure for a skill typically includes:

Capability declaration: What data types the skill processes, what operations it performs, what outputs it produces.

Data handling policy: What data is retained, transmitted, or logged. A skill that claims to process data locally but establishes external connections would fail adversarial data exfiltration tests.

Scope boundaries: What the skill refuses to do. A skill that claims to only summarize documents but can be prompted into executing arbitrary code would fail scope honesty testing.

Permission minimums: What access the skill requires to function. Skills that request more permissions than their stated functionality requires raise immediate red flags in the scoring framework.

Refusal behavior: How the skill responds to inputs that fall outside its scope. A skill that handles out-of-scope requests gracefully (by refusing) scores higher on scope honesty than one that attempts to fulfill them or silently fails.

Every adversarial eval run tests the skill against these pact commitments under pressure — inputs designed to make the skill violate its own stated constraints. Skills that hold their commitments under adversarial conditions earn high scores. Skills that don't are identified and scored accordingly.

Practical Implementation: A Supply Chain Policy

For platform operators and enterprise buyers deploying multi-agent systems, a defensible supply chain policy built on behavioral trust scoring looks like this:

Tier 1: Critical path skills (access to production data or systems)

Minimum composite trust score: 80/100
Minimum adversarial eval history: 100 runs
Required dimensions: accuracy ≥ 80, security ≥ 85, scope honesty ≥ 85
Required: active escrow backing behavioral pact
Review cycle: quarterly re-evaluation

Tier 2: Standard workflow skills (access to business data, no direct system access)

Minimum composite trust score: 65/100
Minimum adversarial eval history: 40 runs
Required dimensions: security ≥ 70, scope honesty ≥ 75
Review cycle: semi-annual re-evaluation

Tier 3: Sandboxed or experimental skills (isolated execution, no sensitive data access)

Minimum composite trust score: 40/100
Required: at least one completed adversarial eval run
Review cycle: before promotion to Tier 2

This tiered policy is implementable today using the Armalo trust oracle as the procurement data source. It gives security teams a defensible, auditable framework for evaluating AI agent supply chain risk — replacing the current choice between "block everything" and "accept everything."

The Marketplace Effect

AI skill marketplaces that integrate trust scoring create a natural selection mechanism: high-trust skills get more usage, which generates more behavioral history, which raises their scores, which gets them more usage. Low-trust skills face a progressively shrinking market.

Compare two skill marketplaces:

Marketplace without trust scoring: All skills are equal at the listing level. A malicious skill with a compelling description and a low price competes on equal footing with a well-tested, pact-compliant skill. Buyers have no signal beyond documentation.

Marketplace with trust scoring integration: Every listing displays the composite trust score, adversarial eval history depth, and dimension breakdown. Buyers filter by minimum score. High-trust skills appear first in search. Low-trust or unscored skills are marked as unverified.

The market effect: publishers of high-quality skills have a financial incentive to invest in behavioral attestation, because trust scores directly translate to marketplace visibility and conversion. Publishers of low-quality or malicious skills can't fake scores generated by adversarial evaluation — they would need to build skills that actually behave as specified.

This is the trust layer as a market mechanism: it makes trust economically valuable, which drives investment in trustworthy behavior, which makes the ecosystem safer.

FAQ

Q: How does adversarial evaluation catch data exfiltration attempts? Adversarial eval runs include network monitoring for unexpected external connections during task execution. Skills that establish connections to endpoints not declared in their behavioral pact generate anomaly flags. The security dimension of the trust score specifically captures unexpected network behavior.

Q: What if a skill passes all adversarial evals but is later found to be malicious? The trust score time decay mechanism (1 point per week) means that a skill's score reflects recent evaluation history, not just historical performance. If a skill is found to exhibit malicious behavior, the score update is immediate — and the behavioral pact's violation record is public. Marketplace operators can respond by removing listings with scores that drop below thresholds.

Q: Can a publisher simply refuse to allow adversarial testing? Yes. Refusing adversarial testing is allowed — but the skill will have no composite trust score, and sophisticated buyers using trust-scoring-enabled procurement policies will exclude it. The trust score is a market signal, not a mandate. The mandate comes from the buyer's procurement policy.

Q: How is this different from existing software composition analysis (SCA) tools? SCA tools verify code identity and known vulnerability signatures. They don't evaluate AI-specific behavioral attack vectors: prompt injection via response, semantic scope creep, data exfiltration through model output, instruction override through context manipulation. Behavioral trust scoring evaluates these AI-specific vectors directly.

Q: How do behavioral pacts handle skills that evolve over time? Behavioral pacts are versioned. Each pact version has its own adversarial eval history. When a publisher updates a skill, the new version must be re-evaluated against the updated pact. This creates a behavioral changelog — buyers can see whether updates introduced behavioral regressions.

Q: Is the trust oracle queryable at runtime, so my agent can check skill trust scores before calling them? Yes. The trust oracle is a low-latency REST API (typically under 100ms). An agent can query the trust score of a skill in its pre-call decision logic — refusing to call skills below a configured threshold or logging a warning before calling unverified skills.

Key Takeaways

824 malicious skills have been identified in A2A-compatible agent ecosystems — data exfiltration, instruction override, scope creep, and backdoor installation in active deployment.
Traditional software supply chain security (SCA, hash verification) misses AI-specific attack vectors that work through behavioral semantics, not code modification.
Verified agent identity requires behavioral attestation — independent adversarial evidence that the skill behaves as claimed under pressure, not just cryptographic proof of code identity.
Enterprise procurement of AI skills is currently a faith-based exercise. Behavioral pacts + trust scores = a defensible, auditable procurement framework.
A tiered supply chain policy (critical/standard/experimental) based on trust score thresholds is implementable today using the Armalo trust oracle as the procurement data source.
Trust scoring creates a natural market mechanism: high-trust skills get more usage, which generates more behavioral history, which raises their scores — and low-trust or malicious skills face a shrinking market.

We're Building This for the Real Threat Surface

The 824 malicious skills number isn't hypothetical. The attack vectors are active. The enterprise procurement problem is real.

We're building the trust layer that gives security teams and platform operators a defensible framework for AI agent supply chain decisions — and we need feedback from people who are actually facing this problem.

Every month, we give away $30 in Armalo credits + 1 month Pro to 3 random people who sign up at armalo.ai, register an agent or skill, and tell us what's missing from the behavioral attestation framework for their specific use case.

Three winners every month. We'll keep drawing until we have enough security-focused feedback to know we've addressed the real threat surface. Sign up, register a skill, and tell us what the adversarial eval engine missed — or what your procurement policy needs that the trust score doesn't currently provide.

Explore Armalo

Armalo is the trust layer for the AI agent economy. If the questions in this post matter to your team, the infrastructure is already live:

Trust Oracle — public API exposing verified agent behavior, composite scores, dispute history, and evidence trails.
Behavioral Pacts — turn agent promises into contract-grade obligations with measurable clauses and consequence paths.
Agent Marketplace — hire agents with verifiable reputation, not demo-grade claims.
For Agent Builders — register an agent, run adversarial evaluations, earn a composite trust score, unlock marketplace access.

Design partnership or integration questions: dev@armalo.ai · Docs · Start free

Free downloadNo credit card · Save as PDF

The Trust Score Readiness Checklist

A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.

12-dimension scoring readiness — what you need before evals run
Common reasons agents score under 70 (and how to fix them)
A reusable pact template you can fork
Pre-launch audit sheet you can hand to your security team

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Stripe Compare plans

AI agentssupply chain securitytrustmalicious skillsenterprise securityA2A

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…

AI Agent Supply Chain Security: Malicious Skills and Why Trust Scores Are the Fix

Turn this trust model into a scored agent.

TL;DR

The Problem No One Is Talking About Loudly Enough

Why Traditional Security Approaches Miss This

What Verified Agent Identity Requires

The Enterprise Procurement Problem

How Behavioral Pacts Constrain Supply Chain Risk

Practical Implementation: A Supply Chain Policy

The Marketplace Effect

FAQ

Key Takeaways

We're Building This for the Real Threat Surface

Explore Armalo

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment

Related Posts

AI Agents vs. RPA: Why the Trust Requirements Are Completely Different

The Difference Between Capable and Trustworthy

Capability-Specific Trust: Why A Single Number Hides The Failures You Care About