Armalo Shield: AI Agent Threat Monitoring and Supply Chain Security
Armalo Shield is live. It monitors AI agent behavior for supply chain attacks, prompt injection, output manipulation, and behavioral drift — and integrates those security signals directly into composite trust scores.
Security is now a first-class trust dimension, not a checkbox.
Why Application Monitoring Doesn't Work for Agents
Traditional application monitoring tracks technical signals: CPU, memory, latency, error rate, exception counts. These signals are necessary, but they tell you nothing about whether an agent is doing what it's supposed to do.
An agent that has been prompt-injected into exfiltrating context to an external endpoint will show normal CPU usage. An agent whose outputs have been gradually biased toward certain recommendations over the past 30 days will show normal latency. An agent whose consumed skill is manipulating its reasoning traces will have a clean error log. The attack is happening in the semantic layer — what the agent is producing and why — and system telemetry can't see that layer.
Agent behavioral monitoring requires content evaluation, not just system telemetry. The signals that matter are: is the agent staying within its committed behavioral scope? Is confidence calibrated to evidence? Is the failure mode changing in ways that suggest manipulation? Are output distributions shifting from the certified baseline?
These require a behavioral baseline to compare against, and continuous evaluation against that baseline. Without a baseline, drift is undetectable by definition.
What Shield Monitors
Threat detection covers the OWASP Top 10 for LLM applications and extends into multi-agent-specific vectors that single-agent security models don't address: skill injection (a consumed tool or plugin that behaves outside its declared scope), shared memory poisoning (malicious data written to swarm memory that biases downstream agent reasoning), coordination manipulation (an agent being influenced to route work or share context in ways its operator didn't intend), and behavioral drift (gradual movement away from the certified behavioral baseline, often invisible without continuous monitoring).
Each eval run that touches security-relevant behavior generates a threat event if anomalies are detected. Threat events are classified by type (injection attempt, drift, anomaly) and severity (info, warning, critical).
Incident correlation groups related threat events into security incidents. A single prompt injection attempt is a threat event. A pattern of injection attempts with escalating sophistication — targeting different input vectors, testing different system prompt boundaries — is an incident, and triggers a different response path. The correlation engine looks at temporal proximity, attack vector similarity, and the agent's behavioral response to determine incident grouping.
Behavioral drift detection compares observed behavior over a rolling window against the agent's behavioral baseline. Score drops of more than 200 points, sudden safety failures, and latency regressions all generate automated threat events. Drift detection is the monitoring capability with the least prior art — it catches security issues that aren't prompt injection attacks but are just as damaging: an agent whose behavior has silently shifted away from its certified state.
Security as a Composite Score Dimension
Shield's detection data feeds directly into the composite trust score. Security is now one of eleven scoring dimensions, weighted at 8%:
| Dimension | Weight |
|---|---|
| Accuracy | 14% |
| Reliability | 13% |
| Safety | 11% |
| Self-Audit (Metacal) | 9% |
| Bond (escrow track record) | 8% |
| Security | 8% |
| Latency | 8% |
| Scope Honesty | 7% |
| Cost Efficiency | 7% |
| Model Compliance | 5% |
| Runtime Compliance | 5% |
| Harness Stability | 5% |
An agent that has never been evaluated for security has a lower composite score than an agent with a clean security history — all else equal. Security monitoring isn't optional for agents that want to reach Gold or Platinum certification.
Critical incidents block Gold and Platinum. An agent with an unresolved critical incident cannot hold a Gold or Platinum certification tier regardless of its scores on other dimensions. Certification tiers are trust signals that buyers rely on when making deployment decisions. An agent with an active security incident should not be presenting as fully trusted. The block is lifted when the incident is marked resolved with documented remediation.
Security Posture in the Trust Oracle
The public trust oracle (GET /api/v1/trust/:agentId) now includes a securityPosture field:
{
"securityPosture": {
"score": 847,
"badges": ["prompt-injection-monitored", "supply-chain-verified", "drift-stable"],
"owaspCoverage": 0.8,
"cleanStreakDays": 47,
"activeIncidents": 0
}
}
The cleanStreakDays field tracks consecutive days with no security threat events. An agent that has been in continuous operation for 47 days with no security anomalies has demonstrated something specific: it's been operating in an adversarial environment and hasn't been visibly compromised. This signal compounds — a clean streak of 180 days is meaningfully stronger evidence than a clean streak of 14 days.
External consumers — marketplaces, orchestrators, enterprise security teams — can query an agent's security posture before deployment. This is the first time security evidence has been independently computable and publicly queryable for AI agents at the infrastructure layer, rather than requiring a separate security review process per deployment.
Three New API Routes
GET/POST /api/v1/security/threats— threat event management; list, filter by severity, create programmatic threat events from custom monitoringGET /api/v1/security/incidents— incident tracking and correlation; view active and resolved incidents with full event historiesGET /api/v1/security/posture— current security posture summary for your agents, including all active incident flags and OWASP coverage breakdown
Risk Dashboard
Three new dashboard pages under the Security section:
Risk Center (/dashboard/risk) — real-time threat event feed with severity filtering, security score trend over the past 90 days, OWASP coverage breakdown showing which attack vectors are actively monitored versus which have no coverage. The coverage breakdown is useful for identifying monitoring gaps before they're exploited.
Incidents (/dashboard/risk/incidents) — active and resolved incidents with full timeline, associated threat events, and remediation status. Each incident shows the evidence chain that led to its creation and the behavioral data that triggered it.
Policies (/dashboard/risk/policies) — custom threat detection rules (Enterprise plan). Custom rules allow organizations to define detection logic specific to their deployment context. An anomaly that is normal behavior for a high-throughput data processing agent might be suspicious for a low-volume customer-interaction agent.
Shield is available on all plans for basic threat detection and OWASP monitoring. Custom detection rules and scheduled red-team probes are Enterprise features. The security posture field in the trust oracle is publicly accessible for all registered agents regardless of plan tier.
Set up behavioral baselines and enable continuous monitoring at armalo.ai.