Security

Agent Harness Control Matrix for Security Review

2026-05-105 minArmalo Team

A security-review matrix for agent harnesses covering identity, tool scopes, prompt injection, memory provenance, audit logs, rollback, and recertification.

Continue the reading path

Topic hub

Agent Reputation

This page is routed through Armalo's metadata-defined agent reputation hub rather than a loose category bucket.

Strategic Guide

AI Agent Reputation Systems

Curated Collection

Buyer Guides

The direct answer

A security review for an agent harness should not start with the model. It should start with authority. What can the agent read, write, spend, disclose, remember, delegate, and execute? Which controls narrow that authority when the agent fails? Which artifacts let a reviewer replay the decision later?

That framing turns agent security from a prompt-safety debate into a control-matrix exercise. OWASP's LLM Top 10 names prompt injection and insecure output handling as first-order application risks (https://owasp.org/www-project-top-10-for-large-language-model-applications/), and OWASP's MCP guidance extends the concern into tool and context boundaries (https://owasp.org/www-project-mcp-top-10/). A harness is where those risks become enforceable controls.

Security control matrix

Control area	Review question	Required evidence	Fail-closed behavior
Agent identity	Which agent acted and for which tenant?	signed agent ID, org ID, owner	block unauthenticated action
Tool scope	Which tools and methods are allowed?	tool manifest, scopes, policy version	deny unlisted calls
Instruction channels	Are system, user, tool, and memory content separated?	trace with channel labels	treat tool/memory text as data
Prompt injection	How are direct, indirect, and relay attacks tested?	red-team cases and results	route to review or stop
Memory provenance	Who wrote memory and when does it expire?	memory source, timestamp, proof class	ignore stale/untrusted memory
Audit trail	Can a reviewer replay the action?	inputs, outputs, tool calls, checks	no promotion without trace
Rollback	Can mutation be reversed?	rollback plan or compensating control	block irreversible action
Recertification	What changes expire prior proof?	model/tool/policy change triggers	narrow authority until retested

What security teams should demand

Security teams should ask for a harness-level packet before approving autonomy: agent identity, tool list, data classes, policy version, eval cases, prompt-injection tests, memory rules, audit log sample, and incident response path. This packet should be short enough to review but precise enough to decide.

The mistake is accepting a demo transcript as proof. A demo shows one happy path. A control matrix shows what happens when the path is adversarial, stale, ambiguous, or unauthorized.

Where Armalo fits

Armalo's architecture gives agent work a place to carry identity, pacts, eval evidence, disputes, and trust state. That makes security review less dependent on private context. The reviewer can ask whether the agent has earned the requested authority, whether the proof is fresh, and what consequence follows when evidence weakens.

The claim is not that a score replaces security review. The claim is that security review becomes stronger when the agent's behavioral record is portable and inspectable.

Bottom line

If the harness cannot show identity, scope, evidence, and rollback, the agent should not receive sensitive tools. If it can show those artifacts repeatedly, autonomy can expand with less guesswork and more accountability.

Agent Harness Control Matrix for Security Review should give the team a decision rule it can use, not just stronger language. If the workflow is meaningful enough that another stakeholder could challenge it, then the system needs proof, ownership, and recourse that survive that challenge.

The next step is to pick one consequential workflow, apply the standard there first, and force the trust story to survive a skeptical replay. That is the fastest way to turn the category from content into operating leverage.

Review sequence

Security teams should review an agent harness in the same order an attacker would pressure it. Start with identity, then tool access, then context boundaries, then memory, then output handling, then recovery. A model card or vendor security page is useful background, but it is not a substitute for a harness-specific control packet.

The review should include at least one adversarial test per high-risk channel. Put hostile instructions in a web page, a retrieved document, a tool response, a memory entry, a delegated subtask, and a structured field. Then check whether the harness preserved channel labels and blocked authority expansion. If the only defense is "the system prompt said not to obey it," the control is too weak.

Evidence packet for approval

Artifact	Minimum acceptable form
Tool manifest	tool names, methods, scopes, tenant boundary, mutation class
Context map	system, user, tool, memory, retrieval, and policy channels
Red-team results	cases, expected behavior, actual behavior, fixes
Log sample	one full trace from request to final action
Rollback proof	command, owner, or compensating control
Recertification rule	what model, prompt, policy, or tool change expires approval

Approving an agent without this packet is equivalent to approving a service account without knowing which systems it can touch.

Operating consequences

The control matrix should drive routing. Low-risk read-only tasks may run with sampled review. Mutations should require trace capture and verification. Irreversible actions should require stronger approval or escrow-like recourse. Repeated failures should narrow tool scopes automatically until repair evidence exists.

The matrix should also drive procurement. A buyer should prefer a vendor that can show how autonomy narrows over a vendor that only shows how autonomy expands.

Agent Harness Control Matrix for Security Review becomes more useful when the section explains which decision changes, which failure matters, and what another stakeholder would need to inspect before relying on the workflow.

| Artifact | Minimum acceptable form | | --- | --- | | Tool manifest | tool names, methods, scopes, tenant boundary, mutation class | | Context map | system, user, tool, memory, retrieval, and policy channels | | Red-team results | cases, expected behavior, actual behavior, fixes | | Log sample | one full trace from request to final action | | Rollback proof | command, owner, or compensating control | | Recertification rule | what model, prompt, policy, or tool change expires approval | Approving an agent without this packet is equivalent to approving a service account without knowing which systems it can touch. The obvious objection is that this slows adoption.

Hard objection

The obvious objection is that this slows adoption. It can, if the team tries to build the whole matrix before a narrow pilot. The better approach is to choose one consequential workflow and fill the matrix for that workflow first. The review becomes faster on the second workflow because the evidence shape is already known.

The control matrix should drive routing. Armalo should make this review portable.

Armalo angle

Armalo should make this review portable. If an agent has passed a scoped security review, failed a prompt-injection test, repaired a boundary, or lost authority after an incident, that history should not disappear when the agent moves across teams or marketplaces. It should become part of the agent's trust record.

The obvious objection is that this slows adoption.

agent-harnessessecurity-reviewprompt-injectiontool-scopesauditabilityai-governance

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…