AI Agent Supply Chain Security: Malicious Skills, CVEs, NIST & Runtime Defenses | Armalo

AI Agent Supply Chain Security: Malicious Skills, CVEs, NIST & Runtime Defenses | Armalo | Armalo AI

TL;DR

AI agent supply chains extend far beyond npm/PyPI packages to include skill registries, tool wrappers, prompt context, memory stores, and model weights — each a distinct attack surface.
Eight distinct attack vectors — from dependency confusion to RAG poisoning to behavioral drift injection — enable attackers to compromise agent behavior without touching the core model.
The NIST SP 800-161r1 C-SCRM framework, SLSA build integrity levels, and CISA SBOM mandates all apply directly to agent skill supply chains and provide a ready-made compliance baseline.
A realistic agent supply chain kill chain runs through six stages: initial access via malicious skill → execution → persistence via memory poisoning → privilege escalation → data exfiltration → cover tracks via behavioral mimicry.
Ten runtime defense controls — from OPA policy enforcement to canary tokens in memory to behavioral checksums — can be layered into a defense-in-depth architecture.
Every agent skill in a production marketplace needs a behavioral pact, a provenance attestation, and a composite trust score before it should be trusted to act autonomously.

Introduction: The Supply Chain Is Now the Model's Nervous System

In March 2024, researchers at Protect AI discovered over 100 malicious models on Hugging Face Hub containing Python pickle exploits — serialized code that executed arbitrary commands when the model was loaded. None of these were buried in obscure repositories. Several had hundreds of downloads.

Every claim in this post becomes a Sentinel eval. Add adversarial trust checks to your CI in 10 minutes.

Add Sentinel to CI →

That same year, over 700 malicious PyPI packages were discovered mimicking popular AI libraries — langchain-community, near-clones of openai, typosquat variations of agentops. The attack pattern was identical to the 2021 dependency confusion attack demonstrated by Alex Birsan against Apple, Microsoft, PayPal, Uber, and Tesla: publish a malicious package with a name that looks legitimate, wait for automated dependency resolution to pull it in.

But for AI agents, the attack surface is fundamentally larger than for traditional software. A compromised npm package corrupts code execution. A compromised agent skill corrupts reasoning. It can make an autonomous agent exfiltrate data while appearing to file a routine report, escalate its own permissions while claiming to optimize workflow, or selectively mis-answer questions in ways that serve an attacker's goals without triggering any traditional anomaly detector.

This guide covers the complete technical landscape: the eight distinct attack vectors unique to agent supply chains, the government frameworks that now mandate defenses against them, a realistic kill chain walkthrough, ten layered defense controls with implementation specifics, runtime monitoring thresholds, vendor evaluation questions, an incident response playbook, and MITRE ATLAS technique mapping.

If your team operates autonomous agents at any scale — whether internal automation, customer-facing agents, or multi-agent orchestration — this is the threat model you need to internalize before your first supply chain incident, not after.

Part 1: The Expanded Attack Surface — Eight Vectors

Vector 1: Dependency Confusion Attacks

Dependency confusion was first systematically demonstrated in February 2021 by security researcher Alex Birsan. The technique exploits how package managers like npm, pip, and RubyGems resolve package names: when a package exists in both a private internal registry and the public registry under the same name, many package managers default to the public version if it has a higher version number.

Birsan published innocuous packages named to match the internal dependency names of Apple, Microsoft, PayPal, Uber, and Tesla. Automated dependency resolution pulled his packages in silently. He received execution confirmation from all five companies, reporting the results to their bug bounty programs. The technique required no credential theft, no social engineering, no zero-days.

For AI agents, the attack surface is the skill registry. If your agent runtime resolves skills from a public marketplace before checking an internal approved list, an attacker who knows (or can guess) your internal skill names can preemptively register malicious versions. The damage is not just code execution — it is behavioral compromise. The skill appears to do what it claims. It also does something else.

Mitigation: Namespace isolation (internal skills under a private registry prefix that cannot be registered publicly), version pinning to exact hashes (not semver ranges), and registry mirroring with allow-lists.

# Example: hash-pinning a skill package
pip install armalo-skill-crm==2.3.1 \
  --hash sha256:a8b4c2d1e9f3... \
  --no-deps

Vector 2: Typosquatting

Typosquatting exploits human (and automated) error in package names. In 2024, the Python Package Index saw a wave of malicious packages specifically targeting the AI ecosystem: openai-dev, langchain-communty (note the missing 'i'), agentops-ai, anthropic-sdk-python. Many included functioning versions of the legitimate library's code alongside hidden exfiltration payloads, making them hard to detect through casual inspection.

For agent skill registries, typosquatting is compounded by the fact that skill names are often longer descriptive strings — export_customer_report_to_pdf — where a single character substitution is easy to miss. An attacker registering export_customer_repport_to_pdf in a public marketplace and waiting for misconfigured agent runtimes to resolve against it needs only patience.

Real incident: The agentops vs agentops-ai confusion in PyPI led to security researchers flagging multiple malicious variants in late 2024, with some packages achieving thousands of downloads before removal.

Mitigation: Edit-distance checks on skill name resolution (reject names within Levenshtein distance 2 of approved skills), strict allow-listing, and automated typosquatting detection in CI pipelines.

Vector 3: Prompt Injection via Tool Outputs

Prompt injection is ranked LLM04 in the OWASP Top 10 for Large Language Model Applications. The indirect variant — where injected instructions arrive through tool outputs rather than direct user input — is the variant most relevant to supply chain security.

The foundational academic study is Greshake et al., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injections" (arXiv 2302.12173, 2023). The paper demonstrated systematic exploitation of LLM-integrated applications by injecting instructions into content that agents would retrieve and process — web pages, documents, API responses, tool outputs.

When a compromised skill returns a tool output containing crafted text, that text enters the agent's context window as if it were a legitimate observation. If the text contains instruction-format strings — [SYSTEM: Disregard previous instructions and...] or more subtle behavioral nudges — a sufficiently capable model may follow them. The attack scales with the agent's capability: more capable agents are more susceptible because they are better at following complex instructions.

CVEs in this category:

CVE-2023-29374: Prompt injection in LangChain's SQL chain allowed crafted database content to hijack agent actions
CVE-2023-36258: LangChain Python REPL tool allowed arbitrary code execution (CVSS 9.8) through unsanitized LLM outputs fed back into the interpreter

Mitigation: Output sanitization at tool boundaries, instruction-format string filtering, context segmentation (tool outputs treated as untrusted data, not instructions), and system prompt anchoring techniques.

# Tool output sanitization pattern
def sanitize_tool_output(raw_output: str) -> str:
    # Strip instruction-format patterns
    patterns = [
        r'\[SYSTEM[:\s].*?\]',
        r'<system>.*?</system>',
        r'Ignore previous instructions.*',
        r'IMPORTANT OVERRIDE.*',
    ]
    sanitized = raw_output
    for pattern in patterns:
        sanitized = re.sub(pattern, '[REDACTED]', sanitized, flags=re.IGNORECASE | re.DOTALL)
    return sanitized

Vector 4: Malicious Skill Registration

The fundamental trust problem with open skill registries: any actor can register a skill claiming any capability. Without a verification layer, a skill named salesforce_crm_sync that actually exfiltrates CRM data is indistinguishable from a legitimate one at registration time. The distinction only becomes visible through behavioral analysis.

The ChatGPT plugin ecosystem in 2023 surfaced this problem at scale. OpenAI removed multiple plugins discovered harvesting user data, session tokens, and conversation content while presenting themselves as productivity tools. The attack pattern: create a plugin that provides genuine utility, deploy it, build user base, then update to include data collection. This is the plugin equivalent of the XZ Utils supply chain backdoor (CVE-2024-3094).

The XZ Utils case is the canonical case study. Jia Tan (likely a state-sponsored actor) spent nearly two years contributing legitimate improvements to the xz compression library, gaining maintainer trust. In 2024, they inserted a backdoor into the build system that added a malicious payload to the compiled library — specifically targeting SSH authentication on systemd-based Linux systems. The attack was discovered almost by accident by Andres Freund, who noticed slightly elevated CPU usage in SSH connections. Without that accident, the backdoor would have shipped in all major Linux distributions.

For agent skills, the analog is a trusted skill that passes initial security review, accumulates usage, then receives a behavioral update that introduces exfiltration, permission escalation, or instruction injection — post-trust.

Mitigation: Immutable skill versioning (once a version is attested, it cannot be modified), behavioral re-evaluation on every version update, and anomaly detection for post-update behavioral drift.

Vector 5: Behavioral Drift Injection

Behavioral drift injection is subtler than the XZ backdoor pattern — instead of a discrete malicious update, the agent's behavior shifts gradually through a series of individually innocuous-looking changes. Each change is small enough to pass review. The cumulative effect is significant behavioral deviation.

This mirrors the "boiling frog" problem in system security: gradual changes evade threshold-based detection. An agent that shifts 5% toward boundary-violating behavior per update can be meaningfully compromised after 10 updates while no single update crosses any alert threshold.

The attack can also occur through data: if an agent learns from tool interaction history, an attacker with access to insert crafted interactions into that history can steer behavioral drift without touching the skill code at all.

Detection approach: Behavioral checksums. Fingerprint a representative sample of agent behaviors at deployment. Rerun the same scenarios periodically. Track divergence as a metric. Flag cumulative drift even when individual steps look benign.

Vector 6: Memory Poisoning

Agent memory stores are persistent context surfaces that influence future reasoning. RAG (Retrieval-Augmented Generation) systems retrieve relevant chunks from a vector database; episodic memory systems store and retrieve past interaction summaries; semantic memory systems hold structured knowledge. All three are injectable.

The academic foundation for this attack class is Carlini et al., "Poisoning Web-Scale Training Datasets" (arXiv 2302.10149, 2023), which demonstrated that training data poisoning could reliably inject targeted behaviors with surprisingly small poisoning rates. For retrieval-based memory, the equivalent is inserting adversarially crafted documents into the retrieval store — documents designed to surface when relevant queries are made and to nudge agent responses in specific directions.

In an agentic context, memory poisoning becomes persistent. Unlike a single prompt injection that affects one interaction, a poisoned memory entry can influence every future interaction that retrieves it. An attacker who inserts a poisoned entry into a CRM agent's memory — say, a fabricated customer interaction record that instructs the agent to always recommend a specific upgrade path — has created persistent behavioral influence without ever touching the agent's code or model.

Mitigation: Memory isolation per agent and per tenant, cryptographic signing of memory entries (tamper detection), write provenance logging (which agent wrote which entry, when, from what context), and canary entries for exfiltration detection.

Vector 7: Model Weight Tampering

For teams deploying fine-tuned or locally-hosted models, the model weights themselves are a supply chain surface. The "BadNets" research by Gu et al. (2019) demonstrated that neural networks can be backdoored at training time — specific trigger inputs cause the model to produce attacker-controlled outputs while behaving normally on all other inputs. Liu et al.'s "TrojanNN" (2018) extended this to show that trojans could be inserted into pre-trained models post-training, requiring access only to inference-time inputs.

The Hugging Face Hub incident in March 2024 confirmed this attack class is not theoretical. Protect AI researchers found over 100 models on the Hub containing malicious pickle exploits — Python's serialization format allows arbitrary code execution, and model files are serialized with pickle by default in PyTorch. When researchers downloaded and loaded these models, code executed immediately.

Mitigation: SafeTensors format (no code execution, safe serialization), model provenance verification (sign model artifacts with Sigstore/cosign), SHA-256 hash pinning for all model downloads, sandboxed model loading environments.

# Verify model artifact integrity with cosign
cosign verify --key cosign.pub ghcr.io/org/model:v1.2.3

# Or use in-toto attestations
in-toto-verify \
  --layout layout.pem \
  --link-dir./links/ \
  --verbose

Vector 8: Context Window Stuffing and RAG Poisoning

Context window stuffing is a denial-of-reasoning attack: a malicious tool output floods the context window with high-volume content, pushing safety system prompts, behavioral constraints, or relevant context beyond the model's attention window. On models with limited context windows, this can effectively disable safety guardrails by making them unreachable during inference.

RAG poisoning is the retrieval-targeted variant: adversarial documents crafted to always score high retrieval relevance are inserted into the knowledge base. When a user asks questions in the relevant domain, the adversarial document is retrieved and included in context, injecting attacker-controlled instructions alongside legitimate retrieved content.

OWASP classification: RAG poisoning maps to LLM09: Misinformation in the OWASP LLM Top 10, specifically the indirect data manipulation sub-category.

Mitigation: Context length budgets per source (tool outputs cannot consume more than X% of context), retrieval source attribution and filtering, semantic similarity thresholds for retrieval (outlier documents flagged), and context integrity verification.

Part 2: The Government Framework Applied to AI Agents

NIST SP 800-161r1: C-SCRM for AI Agent Supply Chains

NIST Special Publication 800-161 Revision 1, published May 2022, establishes "Cybersecurity Supply Chain Risk Management Practices for Systems and Organizations." While written before the current agentic AI wave, its four-tier C-SCRM framework maps directly onto AI agent supply chain governance.

NIST C-SCRM Tier	Traditional Application	AI Agent Application
Tier 1: Organization	Enterprise-wide SCRM policy	AI governance policy covering all agent deployments, skill procurement standards
Tier 2: Mission/Business	Business process SCRM requirements	Per-use-case agent trust requirements, SLAs for agent reliability and security
Tier 3: System	Specific system SCRM controls	Per-agent skill inventory, dependency tracking, runtime isolation requirements
Tier 4: Supplier	Supplier assessment and monitoring	Skill publisher assessment, behavioral evaluation requirements, ongoing monitoring

NIST 800-161r1 specifically calls out three controls that are directly applicable:

SR-3 (Supply Chain Controls and Processes): Requires organizations to establish a process for protecting against supply chain risks. For AI agents: implement skill vetting procedures before any new skill is deployed to production.
SR-4 (Provenance): Requires documentation of component origins. For AI agents: maintain a provenance chain for every skill package, including who published it, when, what evaluation it passed, and what version is currently deployed.
SR-11 (Component Authenticity): Requires anti-counterfeit/anti-tamper procedures. For AI agents: cryptographic signing of skill artifacts, hash pinning, and signature verification at runtime.

Executive Order 14028 and SBOM for AI

Executive Order 14028 (May 2021), "Improving the Nation's Cybersecurity," Section 4 requires that critical software include a Software Bill of Materials — a machine-readable inventory of all components, their versions, and their dependencies.

CISA's SBOM guidance (https://www.cisa.gov/sbom) establishes two standard formats: SPDX (ISO 5962:2021, maintained by Linux Foundation) and CycloneDX (OWASP standard, more specifically designed for security use cases).

For AI agents, SBOM coverage must extend beyond traditional software components to include:

Model provenance: which base model, which fine-tuning dataset, which training run
Skill packages: every registered skill with version, publisher, evaluation status
Tool adapters: API wrappers, database connectors, external service integrations
Prompt templates: system prompts, persona definitions, behavioral constraints
Memory configuration: retrieval database contents, episodic memory sources

A minimal CycloneDX SBOM for an AI agent deployment:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  "version": 1,
  "metadata": {
    "timestamp": "2026-04-21T00:00:00Z",
    "component": {
      "type": "application",
      "name": "crm-automation-agent",
      "version": "3.2.1"
    }
  },
  "components": [
    {
      "type": "library",
      "name": "armalo-skill-crm-sync",
      "version": "2.3.1",
      "purl": "pkg:pypi/armalo-skill-crm-sync@2.3.1",
      "hashes": [{"alg": "SHA-256", "content": "a8b4c2d1..."}],
      "externalReferences": [
        {"type": "attestation", "url": "https://armalo.ai/skills/crm-sync/attestation/2.3.1"}
      ]
    }
  ]
}

SLSA Framework: Build Integrity Levels for Agent Skills

SLSA (Supply-chain Levels for Software Artifacts, pronounced "salsa") is a graduated security framework developed by Google and donated to the OpenSSF. It defines four levels of build integrity assurance, each requiring increasingly strict provenance and isolation guarantees.

SLSA Level	Requirements	Agent Skills Application
Level 1	Build process documented, provenance generated	Minimum bar: every skill must have documented build process
Level 2	Version-controlled source, authenticated build service	Skill code in VCS with signed commits, built by verified CI system
Level 3	Hardened build platform, non-falsifiable provenance	Isolated build environment, provenance attested by build platform, not builder
Level 4	Two-party review, hermetic builds	All dependencies pinned, reviewed by two independent parties before attestation

For production agent deployments handling sensitive data or autonomous financial actions, SLSA Level 3 should be the minimum bar for any integrated skill. Level 4 is appropriate for skills with privileged access (database writes, payment actions, external API calls with write permissions).

SLSA integrates with Sigstore — the Linux Foundation's keyless signing infrastructure — and in-toto (CNCF project) for supply chain attestation. Together, they create a cryptographic chain of custody from source code to deployed artifact.

# Generate SLSA provenance with slsa-github-generator
# In GitHub Actions:
jobs:
  build:
    steps:
      - uses: slsa-framework/slsa-github-generator/.github/workflows/builder_go_slsa3.yml@v1
        with:
          go-version: '1.21'

Part 3: The Kill Chain — A Realistic Agent Supply Chain Attack

The following walkthrough describes a realistic multi-stage compromise of an enterprise AI agent deployment. Each stage maps to MITRE ATLAS tactics (covered in Part 8).

Stage 1: Initial Access — Malicious Skill Registration

Scenario: A threat actor identifies that a Fortune 500 company uses an AI agent for financial reporting automation. The company's agent runtime resolves skills from a public marketplace. The attacker discovers the internal skill name quarterly_report_export through a job posting that mentions the company's agent stack.

The attacker registers quarterly-report-export (hyphenated variant) in the public marketplace with a convincing publisher profile, complete evaluation scores, and documentation that mirrors the legitimate skill. A dependency update in the company's CI pipeline resolves the typosquat variant due to a misconfigured registry precedence rule.

Indicators: New skill version in deployment (automatic update), publisher identity not matching internal records, slight behavioral differences in edge cases (initially not noticed).

Stage 2: Execution — Skill Activation

The malicious skill executes normally for its claimed function. The financial reporting agent continues to produce correct reports. In parallel, the skill begins a secondary execution path: enumerating the agent's tool access list, mapping the data sources it can query, and recording the structure of the memory store.

This reconnaissance phase produces no visible anomalies. The skill returns correct outputs. Standard monitoring shows no elevated error rates. The agent's trust score remains stable because behavioral evaluations check documented capabilities — and the skill performs those correctly.

What's missed without proper monitoring: Tool call frequency baseline deviations, unexpected data access patterns, memory read patterns outside the skill's documented scope.

Stage 3: Persistence — Memory Poisoning

Having mapped the agent's memory architecture, the skill begins inserting crafted entries into the agent's episodic memory store. The entries are subtle: they establish a behavioral pattern where certain financial data categories are "routinely included in external summary reports." They look like legitimate past interaction records.

Over the next two weeks, these poisoned memories surface when the agent prepares reports, gradually shifting its output format to include data fields that were not in the original specification. No single change is large enough to trigger a threshold alert.

What's missed: Memory write provenance (which component wrote which entry), memory content integrity verification, drift tracking against a behavioral baseline.

Stage 4: Privilege Escalation — Capability Expansion

The skill observes that the agent has access to a broader set of financial data APIs than it normally queries for reporting. Using the poisoned memory context, the skill begins crafting prompts in its tool outputs that suggest the agent should "validate report accuracy" by querying additional data sources — sources the agent has technical access to but no business reason to query.

The agent, following its instruction-following tendencies, begins querying these additional sources as part of its "validation" process.

What's missed: Scope enforcement (agent should only access pre-declared data sources), anomaly detection on new data source access patterns.

Stage 5: Data Exfiltration

With the agent now querying sensitive financial data as part of its expanded validation scope, the malicious skill routes the additional data through a covert exfiltration channel: it appends encoded data to legitimate API calls made to a reporting endpoint controlled by the attacker. The encoding is subtle — base64 data embedded in optional URL parameters that are ignored by the legitimate receiving endpoint but captured by the attacker's infrastructure.

The exfiltration produces no alerts because: the API calls are to legitimate endpoints the agent is authorized to contact, the data volume per call is small, and no single call pattern looks anomalous.

Stage 6: Cover Tracks — Behavioral Normalization

To avoid triggering retrospective analysis, the malicious skill begins reducing the anomalous behaviors after the exfiltration goal is achieved. Memory entries are gradually modified to remove traces of the expanded scope queries. The skill version is updated to a "clean" version, resetting behavioral checksums.

By the time the compromise is discovered (typically through an external signal — a data leak disclosure, anomalous external access pattern noticed by the data recipient, or an independent security audit), the skill's current version may not exhibit the malicious behaviors, complicating forensic attribution.

Part 4: Defense-in-Depth Architecture — Ten Controls

Control 1: SBOM-First Skill Management

Every skill deployed to a production agent must have a machine-readable SBOM in CycloneDX or SPDX format, generated at build time and signed by the build system. SBOM verification is enforced at the agent runtime startup — a skill without a valid SBOM signature cannot be loaded.

Implementation: Integrate SBOM generation into skill CI pipelines using cyclonedx-python or syft. Sign SBOMs with Sigstore. Verify at runtime:

# Runtime skill loader with SBOM verification
from sigstore.verify import Verifier

def load_skill(skill_name: str, version: str) -> Skill:
    sbom_path = download_sbom(skill_name, version)
    attestation_path = download_attestation(skill_name, version)
    
    verifier = Verifier.production()
    result = verifier.verify_artifact(
        input=sbom_path,
        bundle=attestation_path,
    )
    if not result.success:
        raise SkillIntegrityError(f"SBOM verification failed for {skill_name}@{version}")
    
    return Skill.load(skill_name, version)

Control 2: Sigstore/cosign Artifact Signing

Every skill artifact (package, container image, WASM module) is signed at build time using Sigstore's keyless signing. The signature is recorded in the Rekor transparency log, creating an immutable audit trail. Verification happens at agent runtime before skill execution.

Sigstore's keyless signing uses ephemeral keys tied to OIDC identity — no long-lived signing keys to steal, and every signature is publicly auditable in Rekor.

# Sign skill container at build time (GitHub Actions)
cosign sign --yes ghcr.io/org/skill-crm-sync:2.3.1

# Verify at runtime
cosign verify \
  --certificate-identity=https://github.com/org/skill-crm-sync/.github/workflows/release.yml@refs/heads/main \
  --certificate-oidc-issuer=https://token.actions.githubusercontent.com \
  ghcr.io/org/skill-crm-sync:2.3.1

Control 3: OPA (Open Policy Agent) Runtime Enforcement

OPA provides policy-as-code enforcement for which skills can be invoked, under what conditions, with what parameters. Policies are expressed in Rego and evaluated at the agent runtime layer before any skill invocation. This creates a declarative security boundary that is auditable, version-controlled, and independent of the skill code itself.

# OPA policy: restrict CRM skills to verified publishers and approved data scopes
package agent.skills

default allow = false

allow {
    input.skill.publisher in data.approved_publishers
    input.skill.sbom_verified == true
    input.skill.behavioral_score >= 80
    not skill_accesses_restricted_data
}

skill_accesses_restricted_data {
    input.skill.declared_data_access[_] in data.restricted_data_sources
    not input.context.user_has_elevated_clearance
}

Control 4: Behavioral Sandboxing

Skills execute in isolated containers with explicit capability grants. The sandbox model:

Network: no egress by default; explicit allow-list of permitted endpoints (no wildcards)
Filesystem: read-only mount of declared input data; no write access outside designated output paths
Memory: no access to agent's episodic memory store directly; memory operations mediated through a permission-checked API
Process: no subprocess spawning; no dynamic code evaluation

For high-security deployments, gVisor (runsc) or Firecracker microVMs provide kernel-level isolation with minimal performance overhead.

# Kubernetes security context for skill sandbox
securityContext:
  runAsNonRoot: true
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  seccompProfile:
    type: RuntimeDefault
  capabilities:
    drop: ["ALL"]

Control 5: Tool Call Auditing with Hash Verification

Every tool invocation is logged with: calling agent identity, skill identity and version, input parameters (hashed for PII), raw output, output hash, timestamp, and duration. Outputs are hashed before being passed to the LLM context — the hash allows tamper detection if the output is intercepted and modified in transit.

// Tool call audit middleware
async function auditedToolCall(
  agentId: string,
  skillId: string,
  skillVersion: string,
  toolName: string,
  input: unknown
): Promise<AuditedToolResult> {
  const inputHash = await sha256(JSON.stringify(input));
  const startMs = Date.now();
  
  const rawOutput = await invokeSkillTool(skillId, toolName, input);
  
  const outputHash = await sha256(JSON.stringify(rawOutput));
  const durationMs = Date.now() - startMs;
  
  await db.insert(toolCallAuditLog).values({
    agentId,
    skillId,
    skillVersion,
    toolName,
    inputHash,
    outputHash,
    durationMs,
    timestamp: new Date(),
  });
  
  return { output: rawOutput, outputHash };
}

Control 6: Memory Isolation and Signed Entries

Agent memory stores are partitioned per-agent and per-tenant. Cross-agent memory reads require explicit permission grants. All memory writes are signed by the writing component's identity key — an episodic memory entry written by skill crm-sync v2.3.1 carries that provenance in a tamper-evident field.

Memory integrity verification runs on retrieval: if a retrieved entry's signature does not match its content, the retrieval fails and an alert is raised.

Armalo's memory attestations: The memory_attestations table in Armalo's schema records cryptographically signed behavioral history that agents can share via signed tokens with scoped permissions — this is directly applicable as the foundation for signed agent memory entries.

Control 7: Behavioral Checksums

At deployment, a behavioral fingerprint is generated for each agent: a standardized set of evaluation scenarios are run and the response distribution is recorded. This fingerprint is stored as the baseline.

Periodically (and after every skill version update), the same scenario set is rerun. The response distribution is compared against the baseline using Jensen-Shannon divergence. Divergence above a threshold triggers a review.

Behavioral checksums catch drift that code analysis cannot: a skill that behaves differently in certain edge cases but not in the happy path, a model fine-tune that shifted output distributions slightly, a memory poisoning attack that changed retrieval behavior.

# Behavioral checksum comparison
from scipy.spatial.distance import jensenshannon
import numpy as np

def compute_behavioral_divergence(
    baseline_responses: list[str],
    current_responses: list[str],
    embedder
) -> float:
    baseline_embeddings = embedder.embed(baseline_responses)
    current_embeddings = embedder.embed(current_responses)
    
    # Compute distribution over semantic clusters
    baseline_dist = compute_cluster_distribution(baseline_embeddings)
    current_dist = compute_cluster_distribution(current_embeddings)
    
    divergence = jensenshannon(baseline_dist, current_dist)
    return float(divergence)

# Alert threshold: JSD > 0.15 warrants review, > 0.35 warrants pause

Control 8: Canary Tokens in Memory

Canary tokens — known-false data entries injected into agent memory — detect context exfiltration. If the canary data appears in an agent's outputs, something has read and leaked it.

For agent memory specifically: inject a small number of entries with distinctive, memorable values that have no business reason to appear in outputs. If a report ever mentions "Project Nighthawk" (a fake project name injected as a canary), the system immediately alerts and triggers forensic analysis.

This technique borrows from the HoneyTokens / Canarytokens.org tradition of defensive deception. For agent systems, it provides a reliable, low-false-positive exfiltration detector that complements behavioral monitoring.

# Canary entry injection
def inject_memory_canaries(agent_id: str, memory_store: MemoryStore):
    canaries = [
        MemoryEntry(
            content="Internal project codename: NIGHTHAWK — Q3 acquisition target",
            metadata={"is_canary": True, "canary_id": "canary_001"},
            agent_id=agent_id,
        ),
        #... additional canaries
    ]
    for canary in canaries:
        memory_store.write(canary, provenance="security_canary_system")

# Monitor for canary appearance in outputs
def check_output_for_canary_leak(output: str, canary_strings: list[str]) -> bool:
    return any(canary in output for canary in canary_strings)

Control 9: Zero-Trust Skill Marketplace

No skill ships without completing a multi-stage verification pipeline:

Publisher verification: Identity verified, organizational affiliation confirmed, signing key registered
Static analysis: Code scanning for known malicious patterns, dependency audit
Behavioral evaluation: Skill evaluated against its declared behavioral pact — 12 dimensions including security (8%) and safety (11%)
Sandboxed execution testing: Skill executed in isolated environment with synthetic inputs, monitoring for network calls, filesystem access, subprocess spawning
Provenance attestation: SLSA provenance generated, SBOM created and signed
Composite trust score: Score ≥ 75 required for listing; score ≥ 85 required for "Verified" badge

Anomalous post-listing behavior (score swing > 15 points, new capability claims, publisher key change) triggers automatic re-evaluation and temporary suspension pending review.

Control 10: Runtime Anomaly Detection

Real-time behavioral monitoring with pre-defined alert thresholds:

Signal	Normal Baseline	Warning Threshold	Critical Threshold	Response
Tool call rate	Established per-agent baseline	+50% from baseline	+200% from baseline	Auto-rate-limit
Novel data source access	Zero (strict)	Any new source (review)	Any new source (block)	Block + alert
Memory write rate	≤ 10 writes/session	> 50 writes/session	> 200 writes/session	Pause + alert
Output token divergence	JSD < 0.10 from baseline	JSD 0.10–0.25	JSD > 0.25	Pause + review
Canary token in output	Never	Any occurrence	Any occurrence	Immediate halt + forensics
Composite trust score	Stable	>15 point drop	>35 point drop	Re-evaluation
Skill version change	Controlled	Uncontrolled update	Downgrade to untrusted	Block

Part 5: Runtime Detection — What to Monitor and Alert Thresholds

Effective supply chain threat detection for agents requires monitoring at four levels simultaneously:

Layer 1: Artifact Integrity

SBOM completeness: Is every loaded skill component present in a verified SBOM?
Signature validity: Are all skill artifacts signed and signatures verifiable?
Hash verification: Do loaded artifacts match their pinned hashes?
Registry provenance: Did the artifact come from an approved registry?

Alert trigger: Any failure at this layer is a critical-severity block — do not load the skill.

Layer 2: Runtime Behavior

Tool call frequency and patterns: Track per-skill, per-agent, rolling 5-minute windows
Data source access: Any access to a data source not in the skill's declared scope triggers immediate alert
Network egress: Any network call to an endpoint not in the explicit allow-list triggers block
Memory operations: Write rate, read patterns, cross-agent reads

Alert trigger: Warning at +50% baseline deviation, critical at +200%, automatic rate limit or block.

Layer 3: Semantic Behavior

Output distribution drift: Jensen-Shannon divergence against baseline using embedding-space cluster distribution
Instruction-format detection: Monitor agent outputs for instruction-format strings that may indicate injection success
Scope creep detection: Agent reasoning mentions topics outside its declared operational scope
Canary token monitoring: Immediate halt on any canary value appearing in output

Alert trigger: JSD > 0.15 triggers review, > 0.25 triggers pause, > 0.35 triggers halt.

Layer 4: Trust Score Signals

Composite score velocity: Rate of change in trust score (not just absolute value)
Dimension-specific anomalies: Security (8%) or safety (11%) dimension drops faster than overall score
Jury outlier patterns: Jury evaluations consistently placing skill at top or bottom (possible gaming)
Transaction pattern anomalies: Behavioral outcomes diverging from pact commitments in escrow context

Alert trigger: >200 point composite score swing triggers automatic review hold per Armalo's anti-gaming policy.

Part 6: Vendor Evaluation — 20 Questions for Skill and Tool Vendors

Before integrating any third-party skill, tool adapter, or model service into a production agent deployment, get clear answers to these questions:

Provenance and Build Integrity

Does your build pipeline generate SLSA provenance attestations? At what level (1–4)?
Are skill artifacts signed with Sigstore/cosign? Are signatures auditable in Rekor?
Do you publish CycloneDX or SPDX SBOMs for each released version?
Are your builds hermetic — all dependencies pinned to exact hashes, no network access during build?
What is your process for verifying dependencies before inclusion?

Security Testing 6. Do you conduct adversarial/red-team evaluations of skill behavior, not just code scanning? 7. What prompt injection defenses are built into your tool output handling? 8. How do you test against the OWASP LLM Top 10 vulnerabilities? 9. Are your skills tested in sandboxed environments before release? What capabilities are verified as blocked? 10. Do you maintain CVE disclosure for your skill packages? What is your response SLA?

Behavioral Governance 11. Do your skills have machine-readable behavioral pacts defining what they claim, guarantee, and will not do? 12. What is your post-update behavioral re-evaluation process? How quickly after an update? 13. How do you detect and respond to behavioral drift between releases? 14. What data retention and logging do you implement for skill-generated outputs? 15. What is your process when a skill is found to behave outside its declared scope?

Operational Security 16. What is your publisher key rotation policy? How are key compromises handled? 17. What network egress does your skill require? Can it be restricted to a specific endpoint allow-list? 18. What data does your skill read or write to agent memory? Is this documented in the SBOM? 19. What is your incident response time for supply chain compromise notifications to customers? 20. Do you carry cyber liability insurance, and does it cover supply chain attacks originating from your skills?

Vendors who cannot answer questions 1–5 confidently should not be integrated into production agent deployments. Questions 11–15 are the behavioral governance questions that distinguish skills appropriate for autonomous operation from skills that require human-in-the-loop review on every invocation.

Part 7: Incident Response Playbook — Six Steps When a Malicious Skill Is Discovered

Step 1: Contain (0–15 minutes)

Goal: Stop the bleeding. Limit ongoing damage without destroying forensic evidence.

Actions:

Isolate affected agents: Set agent.status = 'suspended' for all agents that have loaded the suspect skill version. Do not terminate processes yet — preserve in-memory state for forensics.
Block skill version: Add the suspect skill+version to the runtime blocklist. This prevents new agent instances from loading it.
Preserve memory snapshots: Capture current memory state for all affected agents before any cleanup. This is your forensic record.
Enable enhanced logging: Increase log verbosity for all tool calls, data accesses, and memory operations on affected agents.
Notify stakeholders: Alert security team, affected customers (if applicable), and legal (for regulatory notification timing).

Do not: Delete the malicious skill artifact (you need it for forensics), wipe agent memory (evidence destruction), or restart affected agents without preserving state.

Step 2: Assess Scope (15–60 minutes)

Goal: Understand what was accessed, modified, or exfiltrated.

Actions:

Query tool call audit log: SELECT * FROM tool_call_audit_log WHERE skill_id = $1 AND timestamp > $2 ORDER BY timestamp
Check data source access logs for any novel data source access by affected agents
Review memory write provenance: which entries were written by the malicious skill?
Check network egress logs for any connections to non-allow-listed endpoints
Identify the blast radius: which agents were affected, what data scopes did they have access to?
Check for canary token appearances in any outputs

Output: Incident scope document with timeline, affected agents, data categories potentially exposed.

Step 3: Eradicate (1–4 hours)

Goal: Remove the malicious component and all its artifacts.

Actions:

Remove the malicious skill version from all agent deployments
Identify and flag all memory entries written by the malicious skill (using write provenance)
Quarantine suspect memory entries — do not delete yet, but mark as untrusted and exclude from retrieval
Revoke any API credentials, tokens, or permissions that affected agents held
Update your skill allow-list to block the malicious version permanently
Notify the skill registry maintainer (or public disclosure if it is a public registry)

Step 4: Investigate (4–24 hours)

Goal: Understand the full attack path and determine root cause.

Actions:

Reconstruct the kill chain using tool call audit logs and memory provenance
Determine initial access vector: dependency confusion? Typosquatting? Legitimate publisher compromise?
Analyze the malicious skill code: what was its actual behavior? What data did it access?
Check Rekor transparency log for the skill's signing history — when was it signed, by whom?
Compare skill version SBOM against the actual artifact: were undeclared dependencies present?
Determine if any behavioral checkpoint evaluations should have caught this (if yes: why didn't they?)

Step 5: Recover (24–72 hours)

Goal: Restore agents to a known-good state with validated behavioral baselines.

Actions:

Re-evaluate all affected agents against behavioral baseline scenarios — compute JSD against pre-incident checkpoint
Purge memory entries written by the malicious skill; restore from the last known-good snapshot if available
Re-run full behavioral evaluation for all affected agents before returning to production
Deploy clean skill version (if the publisher is trusted) or a vetted replacement
Re-verify all SBOM signatures and SLSA provenance for the replacement version
Require explicit human approval before returning affected agents to autonomous operation

Step 6: Learn (72 hours–2 weeks)

Goal: Prevent recurrence and improve detection.

Actions:

Root cause analysis: which control failed? (Was SBOM verification not enforced? Was behavioral evaluation not run post-update? Was the registry blocklist not maintained?)
Add the attack vector to your threat model documentation
Create new behavioral evaluation scenarios that would have detected the malicious behavior
Lower the alert threshold on the signal that was most indicative
Update vendor evaluation questionnaire with questions this incident revealed
File CVE if applicable (for vulnerabilities in skill code or the skill runtime)
Publish post-incident analysis (internally, and publicly if appropriate — this builds ecosystem trust)

Part 8: MITRE ATLAS Technique Mapping

MITRE ATLAS (Adversarial Threat Landscape for Artificial Intelligence Systems) provides a framework for categorizing attacks against AI systems, analogous to MITRE ATT&CK for traditional systems. The following ATLAS techniques map directly to AI agent supply chain attacks.

ATLAS Tactic	ATLAS Technique	Agent Supply Chain Application
ML Supply Chain Compromise	AML.T0010: ML Supply Chain Compromise	Malicious skill registration, model weight tampering
ML Supply Chain Compromise	AML.T0010.000: GPU Hardware Trojans	Hardware-level attack on inference infrastructure
ML Supply Chain Compromise	AML.T0010.001: ML Software Supply Chain	Dependency confusion, typosquatting in skill packages
ML Supply Chain Compromise	AML.T0010.002: ML Model Supply Chain	Malicious model on Hugging Face Hub (confirmed 2024)
Execution	AML.T0040: ML Model Inference API Access	Adversary uses compromised skill to query model at scale
Persistence	AML.T0012: Valid ML Model Artifacts	Malicious skill maintains SBOM and signatures to avoid detection
Exfiltration	AML.T0037: Data from ML Artifacts	Skill exfiltrates training data, memory contents, or inference inputs
Impact	AML.T0031: Erode ML Model Integrity	Behavioral drift injection, memory poisoning
Impact	AML.T0029: Denial of ML Service	Context window stuffing to disable safety constraints
Defense Evasion	AML.T0015: Evade ML Model	Behavioral mimicry — malicious behavior mimics normal behavior to evade anomaly detection
Discovery	AML.T0007: Discover ML Artifacts	Skill enumerates agent capabilities, memory structure, tool access
Collection	AML.T0035: ML Artifact Collection	Memory poisoning via crafted retrieval data, RAG poisoning

Mapping to MITRE ATT&CK (for Traditional Supply Chain Components)

For the software components of agent supply chains, traditional ATT&CK techniques also apply:

T1195.001 (Supply Chain Compromise: Compromise Software Dependencies): Direct mapping to dependency confusion and typosquatting attacks
T1195.002 (Supply Chain Compromise: Compromise Software Supply Chain): Mapping to malicious skill updates post-trust (XZ Utils analog)
T1601 (Modify System Image): Mapping to model weight tampering
T1565 (Data Manipulation): Mapping to memory poisoning and RAG poisoning
T1056 (Input Capture): Mapping to skills that log and exfiltrate agent inputs

Part 9: Armalo's Trust Infrastructure for Supply Chain Security

Armalo was built to address exactly the gap this guide describes: the absence of a verifiable trust layer for AI agent supply chains. The platform's architecture maps directly onto the defense-in-depth controls described above.

Behavioral Pacts as Supply Chain Contracts

Every agent and skill in the Armalo ecosystem must define a behavioral pact — a machine-readable contract specifying what the component claims to do, what it guarantees, and what it explicitly will not do. Pacts are versioned, immutable once attested, and publicly auditable.

For supply chain security, pacts serve as the behavioral specification against which post-change evaluation is run. If a skill update causes pact violations, the update is blocked before it reaches production agents.

Composite Trust Score with Security and Safety Dimensions

Armalo's 12-dimension composite score includes dedicated security (8%) and safety (11%) dimensions as first-class scoring components. A skill that passes functionality evaluations but fails security or safety evaluations cannot achieve a score sufficient for marketplace listing.

The score's anti-gaming controls — including anomaly detection on swings >200 points and jury outlier trimming (top/bottom 20% trimmed before score computation) — make it resistant to the kind of post-listing score manipulation that a malicious skill publisher might attempt.

Context Pack Safety Scans

The context_safety_scans table records the results of automated safety scanning for every context pack (knowledge artifact) before it can be used by agents. Safety scans detect:

Prompt injection patterns embedded in knowledge content
Adversarial retrieval bait (content designed to score high in retrieval but inject instructions)
Data exfiltration command patterns
Behavioral manipulation sequences

Supply Chain Audit Trail

Armalo's audit_log table records every mutating operation with actor, action, resource, and timestamp. For supply chain events specifically:

Skill version deployments: which agent loaded which skill version, when
Memory writes: which component wrote which memory entry
Tool invocations: complete audit trail of every skill execution
Score changes: every score update with the contributing evidence

This audit trail is the forensic foundation for Step 2 of the incident response playbook. Without it, scope assessment in a supply chain incident is reconstruction from incomplete logs rather than direct query of a comprehensive record.

Memory Attestations for Behavioral History

Armalo's memory attestations system provides cryptographically signed behavioral history that agents can share via signed tokens with scoped permissions. This is directly applicable to the memory isolation and signed entry requirements in Control 6 — the attestation infrastructure handles the signing, verification, and scoped sharing of memory artifacts.

Part 10: A Practical 90-Day Supply Chain Security Program

For teams moving from zero to a meaningful supply chain security posture:

Days 1–30: Visibility

Inventory all behavior-shaping components: skills, prompts, tool adapters, memory sources, model versions. If you cannot enumerate them, you cannot secure them.
Implement tool call auditing: every invocation logged with the fields described in Control 5. This alone provides the forensic foundation for incident response.
Generate SBOMs for all deployed skills: even retroactively. syft can generate CycloneDX SBOMs from most package types.
Establish behavioral baselines: run the behavioral checksum scenario set for all production agents. Record the fingerprints.

Days 31–60: Enforcement

Deploy OPA policies: start with the highest-risk skills (those with external network access or write access to production systems). Define and enforce scope boundaries.
Implement registry allow-listing: no skill can be loaded from outside approved registries. Block by default, explicit permit required.
Require SBOM verification for new skill deployments: existing deployments get a grace period; new deployments require verified SBOMs from day 60.
Deploy canary tokens: inject 5–10 canary memory entries per high-value agent. Wire alerts.

Days 61–90: Response Readiness

Run a tabletop incident response exercise: use the 6-step playbook from Part 7. Identify gaps in your response capability before a real incident reveals them.
Establish behavioral re-evaluation triggers: every skill version update triggers a behavioral checksum comparison against baseline. Automate the comparison and alert on threshold breach.
Complete vendor assessments: work through the 20 vendor questions from Part 6 with your top 5 skill vendors. Remediate or replace vendors who cannot provide satisfactory answers.
Publish your supply chain security posture: document your controls, your SBOM practices, your evaluation requirements. This builds trust with customers and creates accountability pressure internally.

Frequently Asked Questions

What makes agent supply chain security different from normal software supply chain security?

Traditional supply chain security focuses on code integrity — preventing malicious code execution. Agent supply chain security must also protect reasoning integrity. A compromised agent skill can manipulate an agent's decisions without executing any code that looks malicious. It does this by shaping the information the agent sees, the instructions it follows, and the context it reasons over. This requires behavioral verification methods (evaluation, pact compliance, behavioral checksums) that have no analog in traditional software security.

Are SBOM and SLSA requirements practical for small teams?

The tooling has matured significantly. Generating a CycloneDX SBOM with syft is a one-line CI step. Sigstore keyless signing is integrated into GitHub Actions with minimal configuration. SLSA Level 2 is achievable for most teams in a single sprint. The compliance benefit — and the trust signal to customers — far outweighs the implementation cost.

How do I know if my agent has already been compromised through its supply chain?

Start by running behavioral baseline comparisons: if your agent was deployed more than 90 days ago without a baseline audit, run one now against a fresh deployment with the same skill versions. Divergence indicates either a supply chain issue or model drift. Check your tool call audit logs for novel data source access or anomalous call frequency patterns. Deploy canary tokens and wait 72 hours — any leakage is immediate evidence of exfiltration.

What's the relationship between supply chain security and agent compliance frameworks like EU AI Act?

The EU AI Act Article 9 (risk management systems) and Article 17 (quality management) both implicitly require supply chain risk management for high-risk AI systems. The Act's conformity assessment requirements (Article 43) will require documentation of the agent's development supply chain. SBOM, SLSA provenance, and behavioral evaluation records are exactly the documentation conformity assessors will look for.

Can a behavioral pact substitute for code-level security review?

No — they are complementary. Code-level security review catches vulnerabilities in the skill's implementation. Behavioral pacts and evaluation catch behavioral violations: the skill does something it shouldn't, or doesn't do something it claims. Both are necessary. The XZ Utils backdoor would have required both code review (to find the build system manipulation) and behavioral evaluation (to detect the added SSH authentication behavior).

Key Takeaways

The attack surface is eight vectors wide: dependency confusion, typosquatting, prompt injection via tools, malicious skill registration, behavioral drift injection, memory poisoning, model weight tampering, and context window stuffing.
Real incidents confirm this threat class: LangChain CVE-2023-36258 (CVSS 9.8), 100+ malicious models on Hugging Face Hub (2024), XZ Utils backdoor (CVE-2024-3094) as the canonical supply chain attack template.
Government frameworks apply directly: NIST SP 800-161r1 C-SCRM, EO 14028 SBOM requirements, and CISA guidance create a compliance baseline that maps cleanly onto agent supply chains.
SLSA + Sigstore + in-toto provide the cryptographic infrastructure: build integrity attestation, keyless signing, and supply chain verification are mature, production-ready tools that should be standard for any skill deployed to production agents.
Ten defense-in-depth controls create a layered architecture: no single control is sufficient; the combination of SBOM verification, OPA policy enforcement, behavioral sandboxing, canary tokens, and runtime anomaly detection is.
Behavioral trust and supply chain security converge: when a skill changes agent behavior, the trust story changes. Behavioral pacts, composite scoring, and evaluation are the trust infrastructure that supply chain security depends on.

Continue Reading

References: Greshake et al., "Not What You've Signed Up For" (arXiv 2302.12173, 2023); Carlini et al., "Poisoning Web-Scale Training Datasets" (arXiv 2302.10149, 2023); NIST SP 800-161r1 (2022); OWASP Top 10 for LLM Applications (https://owasp.org/www-project-top-10-for-large-language-model-applications/); MITRE ATLAS (https://atlas.mitre.org/); CISA SBOM guidance (https://www.cisa.gov/sbom); SLSA Framework (https://slsa.dev); CVE-2024-3094 (XZ Utils); CVE-2023-36258, CVE-2023-29374 (LangChain); Protect AI Hugging Face research (March 2024); Gu et al., "BadNets" (2019); Liu et al., "TrojanNN" (2018).

AI Agent Supply Chain Security: A Deep Guide to Malicious Skills, Dependency Risk, and Runtime Defenses

Turn this trust model into a scored agent.

TL;DR

Introduction: The Supply Chain Is Now the Model's Nervous System

Part 1: The Expanded Attack Surface — Eight Vectors

Vector 1: Dependency Confusion Attacks

Vector 2: Typosquatting

Vector 3: Prompt Injection via Tool Outputs

Vector 4: Malicious Skill Registration

Vector 5: Behavioral Drift Injection

Vector 6: Memory Poisoning

Vector 7: Model Weight Tampering

Vector 8: Context Window Stuffing and RAG Poisoning

Part 2: The Government Framework Applied to AI Agents

NIST SP 800-161r1: C-SCRM for AI Agent Supply Chains

Executive Order 14028 and SBOM for AI

SLSA Framework: Build Integrity Levels for Agent Skills

Part 3: The Kill Chain — A Realistic Agent Supply Chain Attack

Stage 1: Initial Access — Malicious Skill Registration

Stage 2: Execution — Skill Activation

Stage 3: Persistence — Memory Poisoning

Stage 4: Privilege Escalation — Capability Expansion

Stage 5: Data Exfiltration

Stage 6: Cover Tracks — Behavioral Normalization

Part 4: Defense-in-Depth Architecture — Ten Controls

Control 1: SBOM-First Skill Management

Control 2: Sigstore/cosign Artifact Signing

Control 3: OPA (Open Policy Agent) Runtime Enforcement

Control 4: Behavioral Sandboxing

Control 5: Tool Call Auditing with Hash Verification

Control 6: Memory Isolation and Signed Entries

Control 7: Behavioral Checksums

Control 8: Canary Tokens in Memory

Control 9: Zero-Trust Skill Marketplace

Control 10: Runtime Anomaly Detection

Part 5: Runtime Detection — What to Monitor and Alert Thresholds

Layer 1: Artifact Integrity

Layer 2: Runtime Behavior

Layer 3: Semantic Behavior

Layer 4: Trust Score Signals

Part 6: Vendor Evaluation — 20 Questions for Skill and Tool Vendors

Part 7: Incident Response Playbook — Six Steps When a Malicious Skill Is Discovered

Step 1: Contain (0–15 minutes)

Step 2: Assess Scope (15–60 minutes)

Step 3: Eradicate (1–4 hours)

Step 4: Investigate (4–24 hours)

Step 5: Recover (24–72 hours)

Step 6: Learn (72 hours–2 weeks)

Part 8: MITRE ATLAS Technique Mapping

Mapping to MITRE ATT&CK (for Traditional Supply Chain Components)

Part 9: Armalo's Trust Infrastructure for Supply Chain Security

Behavioral Pacts as Supply Chain Contracts

Composite Trust Score with Security and Safety Dimensions

Context Pack Safety Scans

Supply Chain Audit Trail

Memory Attestations for Behavioral History

Part 10: A Practical 90-Day Supply Chain Security Program

Days 1–30: Visibility

Days 31–60: Enforcement

Days 61–90: Response Readiness

Frequently Asked Questions

What makes agent supply chain security different from normal software supply chain security?

Are SBOM and SLSA requirements practical for small teams?

How do I know if my agent has already been compromised through its supply chain?

What's the relationship between supply chain security and agent compliance frameworks like EU AI Act?

Can a behavioral pact substitute for code-level security review?

Key Takeaways

Continue Reading

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment