Agent Harness Control Matrix for Security Review
A security-review matrix for agent harnesses covering identity, tool scopes, prompt injection, memory provenance, audit logs, rollback, and recertification.
Continue the reading path
Topic hub
Agent ReputationThis page is routed through Armalo's metadata-defined agent reputation hub rather than a loose category bucket.
The direct answer
A security review for an agent harness should not start with the model. It should start with authority. What can the agent read, write, spend, disclose, remember, delegate, and execute? Which controls narrow that authority when the agent fails? Which artifacts let a reviewer replay the decision later?
That framing turns agent security from a prompt-safety debate into a control-matrix exercise. OWASP's LLM Top 10 names prompt injection and insecure output handling as first-order application risks (https://owasp.org/www-project-top-10-for-large-language-model-applications/), and OWASP's MCP guidance extends the concern into tool and context boundaries (https://owasp.org/www-project-mcp-top-10/). A harness is where those risks become enforceable controls.
Security control matrix
| Control area | Review question | Required evidence | Fail-closed behavior |
|---|---|---|---|
| Agent identity | Which agent acted and for which tenant? | signed agent ID, org ID, owner | block unauthenticated action |
| Tool scope | Which tools and methods are allowed? | tool manifest, scopes, policy version | deny unlisted calls |
| Instruction channels | Are system, user, tool, and memory content separated? | trace with channel labels | treat tool/memory text as data |
| Prompt injection | How are direct, indirect, and relay attacks tested? | red-team cases and results | route to review or stop |
| Memory provenance | Who wrote memory and when does it expire? | memory source, timestamp, proof class | ignore stale/untrusted memory |
| Audit trail | Can a reviewer replay the action? | inputs, outputs, tool calls, checks | no promotion without trace |
| Rollback | Can mutation be reversed? | rollback plan or compensating control | block irreversible action |
| Recertification | What changes expire prior proof? | model/tool/policy change triggers | narrow authority until retested |
What security teams should demand
Security teams should ask for a harness-level packet before approving autonomy: agent identity, tool list, data classes, policy version, eval cases, prompt-injection tests, memory rules, audit log sample, and incident response path. This packet should be short enough to review but precise enough to decide.
The mistake is accepting a demo transcript as proof. A demo shows one happy path. A control matrix shows what happens when the path is adversarial, stale, ambiguous, or unauthorized.
Where Armalo fits
Armalo's architecture gives agent work a place to carry identity, pacts, eval evidence, disputes, and trust state. That makes security review less dependent on private context. The reviewer can ask whether the agent has earned the requested authority, whether the proof is fresh, and what consequence follows when evidence weakens.
The claim is not that a score replaces security review. The claim is that security review becomes stronger when the agent's behavioral record is portable and inspectable.
Bottom line
If the harness cannot show identity, scope, evidence, and rollback, the agent should not receive sensitive tools. If it can show those artifacts repeatedly, autonomy can expand with less guesswork and more accountability.
Agent Harness Control Matrix for Security Review should give the team a decision rule it can use, not just stronger language. If the workflow is meaningful enough that another stakeholder could challenge it, then the system needs proof, ownership, and recourse that survive that challenge.
The next step is to pick one consequential workflow, apply the standard there first, and force the trust story to survive a skeptical replay. That is the fastest way to turn the category from content into operating leverage.
Review sequence
Security teams should review an agent harness in the same order an attacker would pressure it. Start with identity, then tool access, then context boundaries, then memory, then output handling, then recovery. A model card or vendor security page is useful background, but it is not a substitute for a harness-specific control packet.
The review should include at least one adversarial test per high-risk channel. Put hostile instructions in a web page, a retrieved document, a tool response, a memory entry, a delegated subtask, and a structured field. Then check whether the harness preserved channel labels and blocked authority expansion. If the only defense is "the system prompt said not to obey it," the control is too weak.
Evidence packet for approval
| Artifact | Minimum acceptable form |
|---|---|
| Tool manifest | tool names, methods, scopes, tenant boundary, mutation class |
| Context map | system, user, tool, memory, retrieval, and policy channels |
| Red-team results | cases, expected behavior, actual behavior, fixes |
| Log sample | one full trace from request to final action |
| Rollback proof | command, owner, or compensating control |
| Recertification rule | what model, prompt, policy, or tool change expires approval |
Approving an agent without this packet is equivalent to approving a service account without knowing which systems it can touch.
Operating consequences
The control matrix should drive routing. Low-risk read-only tasks may run with sampled review. Mutations should require trace capture and verification. Irreversible actions should require stronger approval or escrow-like recourse. Repeated failures should narrow tool scopes automatically until repair evidence exists.
The matrix should also drive procurement. A buyer should prefer a vendor that can show how autonomy narrows over a vendor that only shows how autonomy expands.
Agent Harness Control Matrix for Security Review becomes more useful when the section explains which decision changes, which failure matters, and what another stakeholder would need to inspect before relying on the workflow.
| Artifact | Minimum acceptable form | | --- | --- | | Tool manifest | tool names, methods, scopes, tenant boundary, mutation class | | Context map | system, user, tool, memory, retrieval, and policy channels | | Red-team results | cases, expected behavior, actual behavior, fixes | | Log sample | one full trace from request to final action | | Rollback proof | command, owner, or compensating control | | Recertification rule | what model, prompt, policy, or tool change expires approval | Approving an agent without this packet is equivalent to approving a service account without knowing which systems it can touch. The obvious objection is that this slows adoption.
Hard objection
The obvious objection is that this slows adoption. It can, if the team tries to build the whole matrix before a narrow pilot. The better approach is to choose one consequential workflow and fill the matrix for that workflow first. The review becomes faster on the second workflow because the evidence shape is already known.
Agent Harness Control Matrix for Security Review becomes more useful when the section explains which decision changes, which failure matters, and what another stakeholder would need to inspect before relying on the workflow.
The control matrix should drive routing. Armalo should make this review portable.
Armalo angle
Armalo should make this review portable. If an agent has passed a scoped security review, failed a prompt-injection test, repaired a boundary, or lost authority after an incident, that history should not disappear when the agent moves across teams or marketplaces. It should become part of the agent's trust record.
Agent Harness Control Matrix for Security Review becomes more useful when the section explains which decision changes, which failure matters, and what another stakeholder would need to inspect before relying on the workflow.
The obvious objection is that this slows adoption.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading comments…