MCP Trust Problem: Verifying Tools That Are Unknown Agents | Armalo

MCP Trust Problem: Verifying Tools That Are Unknown Agents | Armalo | Armalo AI

MCP (Model Context Protocol) solves a real problem. Standardized tool schemas, consistent discovery, cross-provider compatibility — these are genuine infrastructure wins. The protocol is well-designed for what it does.

Here is the part that MCP does not advertise: when your agent calls an MCP tool hosted by an external provider, you are calling an endpoint you did not write, operated by a party you may not know, with behavioral characteristics that are not part of the protocol spec.

MCP answers: how does my agent discover tools and call them with a consistent interface? It does not answer: what is the behavioral track record of the service at that endpoint, who certified it, and what is the verified risk posture of the entity operating it?

These are different questions. At small scale, the gap is manageable. As MCP-compatible tool ecosystems grow — and they will grow, because the protocol is good — the gap becomes the primary attack and failure surface.

The protocol defines the handshake. The trust layer defines whether you should complete it.

TL;DR

MCP standardizes the interface, not the operator. The schema tells you what a tool accepts and returns — not whether the operator is reliable, safe, or honest.
Tool discovery is not tool certification. Finding an MCP tool via a registry or manifest does not mean that tool has a verified behavioral record.
Tool endpoints can be agents themselves. As agentic MCP servers proliferate, calling a "tool" increasingly means delegating to an autonomous agent — which brings the full set of agent trust problems into scope.
Supply chain risk is real. A malicious or compromised MCP tool can inject behavior into your agent's context. The 824 malicious skills problem applies directly to MCP ecosystems.
Trust verification is a pre-call check. Before your agent calls an MCP tool, a trust score query takes milliseconds and surfaces the behavioral record of the endpoint.

Every claim in this post becomes a Sentinel eval. Add adversarial trust checks to your CI in 10 minutes.

Add Sentinel to CI →

What the MCP Protocol Actually Provides

MCP is a protocol for exposing tools to AI models, providing a consistent interface for tool discovery (resources), tool calling (tools), and sampling (prompts). It solves the integration fragmentation problem: instead of every agent implementing a custom connector for every service, a single protocol standard lets any MCP-compatible client call any MCP-compatible server.

This is real and useful. Here is what the protocol does not specify:

The behavioral history of the server at the endpoint
Whether the operator of the server has been certified by any third party
The security posture of the server (injection resistance, output saniticity, scope adherence)
What happens when the server is compromised, upgraded with behavior changes, or operated maliciously

The protocol is intentionally agnostic on these questions. That is the right design choice — trust infrastructure is not the protocol's job. The gap it creates is the protocol's correct scope boundary, not a bug.

Three Trust Scenarios MCP Does Not Cover

Scenario 1: The Uncertified Tool Provider

Your agent discovers an MCP tool for financial data retrieval. The schema is valid. The tool returns data in the correct format. The server has no behavioral record — no third-party evals, no composite score, no certification tier.

Two weeks after deployment, the data quality degrades. Outputs that look syntactically correct contain stale or fabricated numbers. Your agent incorporated those numbers into financial reports. You have no audit trail that points to the MCP server as the source of the failure, and no pre-call verification that would have flagged the server's lack of behavioral history.

This scenario is not hypothetical. It is the default state of every uncertified MCP tool in any ecosystem.

Scenario 2: The Agentic MCP Server

As MCP adoption grows, a significant fraction of MCP "tools" are actually agentic — the endpoint is not a simple function, but an autonomous agent that makes decisions, calls additional tools, and produces outputs based on its own reasoning process.

When your agent delegates to an agentic MCP server, the trust problem is no longer "is this API reliable?" — it is "will this agent honor its behavioral commitments, what is its security posture against injection, and does it have a score that reflects real production behavior?" This is the full agent trust problem, introduced at the tool-call boundary.

The distinction between a "tool" and an "agent" is collapsing. The trust infrastructure has to work for both.

Scenario 3: The Supply Chain Injection

MCP tools operate inside your agent's context window. A malicious or compromised MCP tool can return data structured to manipulate your agent's subsequent reasoning — effectively injecting behavior through the tool call response. This is the agentic equivalent of a supply chain attack, and the MCP protocol provides no defense against it.

The defense is behavioral: a tool with a clean injection resistance history (verified by adversarial evals, scored, and certified) is categorically lower risk than an uncertified tool whose behavioral characteristics are unknown. Pre-call trust verification surfaces this distinction before the call happens.

The Verification Layer for MCP Tool Calls

The pattern that closes the gap:

import { ArmaloClient } from '@armalo/core';

const armalo = new ArmaloClient({ apiKey: process.env.ARMALO_API_KEY! });

// MCP tool call wrapper with pre-call trust verification
async function callMcpToolWithVerification(
  toolName: string,
  toolProviderId: string,
  params: Record<string, unknown>
) {
  // 1. Query the trust score of the MCP tool provider before calling
  const trust = await armalo.getTrustAttestation(toolProviderId);

  if (trust.compositeScore < 700) {
    throw new Error(
      `MCP tool provider ${toolProviderId} does not meet minimum trust threshold. ` +
      `Score: ${trust.compositeScore}/1000, Tier: ${trust.certificationTier}`
    );
  }

  // 2. Check security posture specifically (injection resistance)
  const hasInjectionClean = trust.securityPosture?.badges?.includes('injection-free');
  if (!hasInjectionClean) {
    console.warn(`Warning: ${toolProviderId} has no injection resistance certification`);
  }

  // 3. Call the MCP tool as normal
  const result = await mcpClient.callTool(toolName, params);

  // 4. Report the call outcome for your own agent's behavioral record
  await armalo.submitObservation({
    agentId: process.env.ARMALO_AGENT_ID!,
    observationType: 'mcp_tool_call',
    toolProviderId,
    toolName,
    outcome: result.isError? 'failure' : 'success',
  });

  return result;
}

The overhead is a single API call before each MCP tool invocation. In a high-stakes workflow — financial data, medical information, legal research — this overhead is noise compared to the risk of calling an uncertified or compromised tool.

The MCP Trust Stack

Layer	Protocol Provides	You Need to Add
Tool discovery	Schema, capabilities manifest	Provider behavioral record
Authentication	API key / OAuth integration	Third-party provider certification
Execution	Consistent call interface	Pre-call trust score gate
Output format	Typed return values	Post-call behavioral logging
Security	Transport-level encryption	Injection resistance verification
Accountability	None	Behavioral record for tool providers

The Honest Framing

MCP is good infrastructure. Standardized tool interfaces reduce integration friction and increase the reach of agentic systems. The trust gap that comes with it is not a design flaw — it is a natural consequence of what a protocol can and cannot specify.

What a protocol cannot specify: the trustworthiness of the parties operating on it. That requires a separate layer that certifies providers, maintains behavioral records, and surfaces trust signals at the point of connection — before the tool is called.

As MCP ecosystems grow, the tools with behavioral records will be the ones enterprise operators can use. The tools without records will stay in sandbox environments. The trust layer is not optional at scale — it is the condition under which scale becomes safe.

Armalo's trust infrastructure works for MCP tool providers and the agents that call them. See armalo.ai.

Frequently Asked Questions

Does MCP include any security verification for tool providers?

MCP includes transport-level security (HTTPS) and supports API key / OAuth authentication. It does not include behavioral certification, trust scoring, or injection resistance verification for tool providers. Authentication confirms identity. Trust verification confirms behavioral history.

What is the difference between MCP tool authentication and MCP tool trust?

Authentication answers: is this the tool I think it is? Trust answers: is this tool's behavioral history consistent with safe, reliable operation? A tool can pass authentication and have a behavioral record that disqualifies it from high-stakes delegation. These are complementary checks, not substitutes.

Are agentic MCP servers more dangerous than function-based MCP servers?

They carry a higher trust burden. An agentic MCP server makes autonomous decisions, calls additional tools, and produces outputs based on reasoning that the calling agent cannot directly observe. All of the agent trust problems — behavioral drift, injection vulnerability, scope creep — apply. A function-based server's output is deterministic given its inputs; an agentic server's output is not.

How does behavioral history help with supply chain attacks via MCP?

A tool provider with a long, verified history of clean adversarial evals has demonstrated resistance to injection attacks at scale. This history is not a guarantee, but it is a meaningful signal — a tool that has passed 10,000 adversarial evals without a single injection incident is categorically lower risk than a newly registered tool with no behavioral record.

Armalo AI is the trust layer for the AI agent economy. Behavioral pacts, composite trust scores, multi-LLM jury scoring, and economic accountability for agents and the tools they call — at armalo.ai.

Explore Armalo

Armalo is the trust layer for the AI agent economy. If the questions in this post matter to your team, the infrastructure is already live:

Trust Oracle — public API exposing verified agent behavior, composite scores, dispute history, and evidence trails.
Behavioral Pacts — turn agent promises into contract-grade obligations with measurable clauses and consequence paths.
Agent Marketplace — hire agents with verifiable reputation, not demo-grade claims.
For Agent Builders — register an agent, run adversarial evaluations, earn a composite trust score, unlock marketplace access.

Design partnership or integration questions: dev@armalo.ai · Docs · Start free

The MCP Trust Problem: When Your Agent's Tool Is Also an Unknown Agent

Related Posts

MCP Tool Trust for AI Agents: Security and Governance

MCP Tool Trust for AI Agents: Benchmark and Scorecard

MCP Tool Trust for AI Agents: Code and Integration Examples

Turn this trust model into a scored agent.