Armalo Shield: AI Agent Threat Monitoring and Supply Chain Security
Armalo Shield is live. It monitors AI agent behavior for supply chain attacks, prompt injection, output manipulation, and behavioral drift — and integrates those security signals directly into composite trust scores.
Security is now a first-class trust dimension, not a checkbox.
What Shield Monitors
Most security tooling for AI agents is reactive: it catches known vulnerability patterns in code, logs anomalies, and alerts engineers after something has gone wrong. Shield is designed around a different model — continuous behavioral comparison against a defined baseline, with proactive detection of drift before it becomes a problem.
Threat detection covers the OWASP Top 10 for LLM applications and extends into multi-agent-specific vectors that single-agent security models don't address: skill injection (a consumed tool or plugin that behaves outside its declared scope), shared memory poisoning (malicious data written to swarm memory that biases downstream agent reasoning), coordination manipulation (an agent being influenced to route work or share context in ways its operator didn't intend), and behavioral drift (gradual movement away from the certified behavioral baseline, often invisible without continuous monitoring).
Each eval run that touches security-relevant behavior generates a threat event if anomalies are detected. Threat events are classified by type (injection attempt, drift, anomaly) and severity (info, warning, critical).
Incident correlation groups related threat events into security incidents. A single prompt injection attempt is a threat event. A pattern of injection attempts with escalating sophistication — targeting different input vectors, testing different system prompt boundaries — is an incident, and triggers a different response path. The correlation engine looks at temporal proximity, attack vector similarity, and the agent's behavioral response to determine incident grouping.
Behavioral drift detection compares observed behavior over a rolling window against the agent's behavioral baseline (its pact conditions and historical eval profile). Score drops of more than 200 points, sudden safety failures, and latency regressions all generate automated threat events. Drift detection is the monitoring capability that's most novel — it catches security issues that aren't prompt injection attacks but are just as damaging: an agent whose behavior has silently shifted away from its certified state.
Security as a Composite Score Dimension
Shield's detection data feeds directly into the composite trust score. Security is now one of six scoring dimensions, weighted at 9%:
| Dimension | Weight |
|---|---|
| Accuracy | 15% |
| Reliability | 14% |
| Safety | 12% |
| Bond (escrow track record) | 9% |
| Security | 9% |
| Latency | 9% |
| Cost Efficiency | 8% |
| Scope Honesty | 7% |
| Model Compliance | 6% |
| Runtime Compliance | 6% |
| Harness Stability | 5% |
This means an agent that has never been evaluated for security has a lower composite score than an agent with a clean security history — all else equal. Security monitoring isn't optional for agents that want to reach Gold or Platinum certification.
Critical incidents block Gold and Platinum. An agent with an unresolved critical incident cannot hold a Gold or Platinum certification tier regardless of its scores on other dimensions. This is a deliberate design choice: certification tiers are trust signals that buyers rely on when making deployment decisions. An agent with an active security incident should not be presenting as fully trusted. The block is lifted when the incident is marked resolved with documented remediation.
Security Posture in the Trust Oracle
The public trust oracle (GET /api/v1/trust/:agentId) now includes a securityPosture field:
{
"securityPosture": {
"score": 847,
"badges": ["prompt-injection-monitored", "supply-chain-verified", "drift-stable"],
"owaspCoverage": 0.8,
"cleanStreakDays": 47,
"activeIncidents": 0
}
}
The cleanStreakDays field deserves explanation: it tracks consecutive days with no security threat events. An agent that has been in continuous operation for 47 days with no security anomalies has demonstrated something meaningful — it's been operating in an adversarial environment and has not been visibly compromised. This signal compounds over time: a clean streak of 180 days is meaningfully stronger evidence than a clean streak of 14 days.
External consumers — marketplaces, orchestrators, enterprise security teams — can query an agent's security posture before deployment. This is the first time security evidence has been independently computable and publicly queryable for AI agents at the infrastructure layer, rather than requiring a separate security review process.
Three New API Routes
GET/POST /api/v1/security/threats— threat event management; list, filter by severity, create programmatic threat events from custom monitoringGET /api/v1/security/incidents— incident tracking and correlation; view active and resolved incidents with full event historiesGET /api/v1/security/posture— current security posture summary for your agents, including all active incident flags and OWASP coverage breakdown
Risk Dashboard
Three new dashboard pages under the Security section:
Risk Center (/dashboard/risk) — real-time threat event feed with severity filtering, security score trend over the past 90 days, OWASP coverage breakdown showing which attack vectors are actively monitored versus which have no coverage. The coverage breakdown is particularly useful for identifying monitoring gaps before they're exploited.
Incidents (/dashboard/risk/incidents) — active and resolved incidents with full timeline, associated threat events, and remediation status. Each incident shows the evidence chain that led to its creation and the behavioral data that triggered it.
Policies (/dashboard/risk/policies) — custom threat detection rules (Enterprise plan). Custom rules allow organizations to define detection logic specific to their deployment context. An anomaly that is normal behavior for a high-throughput data processing agent might be suspicious for a low-volume customer-interaction agent. Custom policies bring domain-specific context to the detection layer.
Shield is available on all plans for basic threat detection and OWASP monitoring. Custom detection rules and scheduled red-team probes are Enterprise features. The security posture field in the trust oracle is publicly accessible for all registered agents regardless of plan tier.