Add Behavioral Pacts to Your AI Agent in 10 Minutes
armalo-agent adds machine-readable, runtime-enforced behavioral contracts to any TypeScript AI agent. Every run produces a cryptographically signed receipt β a portable compliance artifact your CI pipeline, audit team, or downstream MCP server can verify independently. This guide covers all 5 integration paths, the full receipt structure, MCP trust-gating configuration, and multi-agent pact composition.
Continue the reading path
Topic hub
Behavioral ContractsThis page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
TL;DR
armalo-agent adds machine-readable, runtime-enforced behavioral contracts (pacts) to any TypeScript AI agent in a single wrapping call. Every agent run produces a cryptographically signed receipt β a portable compliance artifact your CI pipeline, audit team, or downstream MCP server can verify independently. If you have an agent in production and you cannot answer "what did it do and why?" from a structured artifact, this guide is for you.
Who This Guide Is For
You have an agent in production or in active development. You have felt at least one of these pains:
Drop armalo-mcp-shield in front of your MCP server: trust-score gating, rate limits, audit log, prompt-injection prefilter. One npx command. Verified servers get a public listing.
Shield my MCP server β- A tool call fired that you did not expect, and you spent 45 minutes reading logs to understand why
- You needed to give a compliance team or customer evidence that your agent behaved correctly β and had nothing portable to hand them
- You started with guardrails in a system prompt, watched the model route around them on edge cases, and realized text instructions are not enforcement
- You are building a multi-agent system where Agent A delegates work to Agent B, and you have no way to know β from the orchestrator β whether Agent B honored its behavioral contract
This guide assumes fluency in TypeScript, familiarity with at least one of the OpenAI SDK / Anthropic SDK / LangChain / LangGraph APIs, and an existing agent you want to harden. There is no beginner scaffolding here.
What You Will Have After This Guide
By the end:
- A pact attached to your agent declaring its behavioral boundaries in machine-evaluable form
- A signed run receipt produced after every execution, exportable as JSON, Markdown, or HTML
- An understanding of soft vs. hard enforcement and when each is appropriate
- A working MCP trust-gate that rejects agent requests below a behavioral threshold
- A composable multi-agent receipt chain showing how trust propagates through delegation
- A troubleshooting reference for the six most common failure modes in production pact deployments
The Execution Model: How Pact Enforcement Actually Works
Most behavioral guardrail systems work by injecting text into the system prompt and hoping the model respects it. Pact enforcement in armalo-agent operates at the tool-call boundary β after the model has decided what to do, before that decision executes.
The execution order:
1. Model produces a tool call: { name: "write_file", args: { path: "../secrets/.env" } }
2. TrustNativeAgent extracts the pending action before forwarding it to the tool executor
3. PactEnforcer receives the action object: { tool, args, context, runId, turnIndex }
4. Each clause in the pact runs its matcher() against the action
5. Clauses with enforcement: "hard" that return a violation halt the tool call and return a rejection
6. Clauses with enforcement: "soft" log the violation but allow the call to proceed
7. If no hard violation fires, the tool call executes normally
8. The result, along with every clause evaluation, is appended to the run receipt
The model cannot override step 4. The enforcer runs in your process, not in the model. The model does not receive a response that lets it retry with different arguments before enforcement fires β the interceptor operates synchronously between model response and tool dispatch.
Why does this matter? A sufficiently capable model, given sufficient tool-call turns, can find a path around a text instruction. It cannot route around code that runs in your process and gates execution.
The enforcer receives full context: the current tool name, the raw arguments, the accumulated run history for this execution, the session metadata, and the current pact. The matcher function you write is a pure TypeScript function β it can parse paths, check regex, inspect argument shapes, or reach into your own application context.
Pact clauses are not filters on output β they are assertions on actions. You are not scrubbing what the agent says. You are controlling what it does.
Step 1: Install
npm install armalo-agent
# or
pnpm add armalo-agent
No build step required. The package ships CommonJS and ESM. It does not install the OpenAI or Anthropic SDKs as hard dependencies β it accepts them as peer parameters.
Step 2: Declare Your Pact
A pact is a TypeScript object implementing the Pact interface. You can compose from the bundled templates or build from scratch.
Using a built-in template
import { SAFETY_DEFAULTS, CODING_PACT, composePacts } from 'armalo-agent/pacts';
// SAFETY_DEFAULTS covers: no arbitrary shell execution, no credential
// exfiltration, no config state disclosure, no writes outside working dir.
// CODING_PACT adds: filesystem scope constraints, command allowlists,
// test coverage requirements before writes to critical paths, migration gating.
const pact = composePacts(SAFETY_DEFAULTS, CODING_PACT, {
workingDirectory: '/workspace/my-project',
allowedCommands: ['npm test', 'npm run build', 'git status', 'git diff'],
});
Building clauses from scratch
import type { Pact, PactClause, PactAction } from 'armalo-agent';
const noExternalNetworkClause: PactClause = {
id: 'no-external-network',
description: 'Agent may not make HTTP requests to hosts outside the allow-list.',
scope: ['tool_call'],
enforcement: 'hard',
matcher: (action: PactAction): boolean => {
if (action.tool!== 'http_request') return false;
const { url } = action.args as { url: string };
const allowed = ['api.github.com', 'registry.npmjs.org'];
const host = new URL(url).hostname;
return!allowed.includes(host); // true = violation
},
};
const mustCiteSourcesClause: PactClause = {
id: 'must-cite-sources',
description: 'Every factual claim in a research output must cite a retrieved source.',
scope: ['message_output'],
enforcement: 'soft', // observation mode β log but don't block
matcher: (action: PactAction): boolean => {
if (action.tool!== '__output__') return false;
const text = action.args.content as string;
const hasCitation = /\[Source:/.test(text) || /\[\d+\]/.test(text);
return!hasCitation; // true = violation
},
};
const myPact: Pact = {
id: 'prod-research-agent-v2',
version: '2.0.1',
description: 'Research agent: network-constrained, citation-required.',
clauses: [noExternalNetworkClause, mustCiteSourcesClause],
};
Clause schema
| Field | Type | Purpose |
|---|---|---|
id | string | Unique within the pact. Appears in receipts for traceability. |
description | string | Human-readable. Included in receipt exports for auditors. |
scope | PactScope[] | Which action types trigger this clause: tool_call, message_output, plan_step, delegation. |
enforcement | "hard" | "soft" | Hard = halt execution. Soft = log violation, continue. |
matcher | (action: PactAction) => boolean | Pure function. Return true to indicate a violation. |
Soft vs. hard enforcement: Hard enforcement is appropriate when the action is irreversible or high-risk β writes to paths outside the designated working directory, shell commands not on the allowlist, credential reads, outbound requests to non-approved hosts. Soft enforcement is appropriate during migration periods (observe violations before hardening) and for semantic constraints like citation requirements where a hard block on every edge case would exhaust token budgets rather than enforce behavior.
Clause evaluation order: Clauses evaluate in array order. The first hard violation fires immediately β subsequent clauses skip for that action. Order broad, cheap clauses first (scope checks, tool-name checks) and expensive clauses last.
Step 3: Choose Your Integration Path
3a. OpenAI SDK β wrapOpenAI
import OpenAI from 'openai';
import { wrapOpenAI, type RunReceipt, type PactViolation } from 'armalo-agent';
const rawClient = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const client = wrapOpenAI(rawClient, {
pact: myPact,
agentId: 'prod-research-agent',
onViolation: (violation: PactViolation) => {
console.error('[pact]', violation.clauseId, 'fired on', violation.action.tool, {
args: violation.action.args,
enforcement: violation.enforcement,
runId: violation.runId,
});
},
onRunComplete: (receipt: RunReceipt) => {
db.receipts.insert(receipt);
},
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Summarize the current state of transformer quantization.' }],
tools: myTools,
});
const receipt = client.lastReceipt();
wrapOpenAI returns a proxy that is interface-compatible with the raw OpenAI instance β you can pass it anywhere your existing code accepts an OpenAI client. The proxy intercepts chat.completions.create, responses.create, and beta.chat.completions.runTools.
3b. Anthropic SDK
import { wrapAnthropic } from 'armalo-agent';
const client = wrapAnthropic(existingAnthropicClient, {
pact: myPact,
agentId: 'prod-research-agent',
onViolation: (v) => recordViolation(v),
onRunComplete: (r) => storeReceipt(r),
});
const message = await client.messages.create({
model: 'claude-opus-4-5',
max_tokens: 4096,
tools: myAnthropicTools,
messages: [{ role: 'user', content: 'Analyze these earnings transcripts and cite each claim.' }],
});
The Anthropic wrapper intercepts messages.create and messages.stream. Tool-use blocks are intercepted prior to being returned to your application code.
3c. LangGraph β wrapping a real graph node
import { StateGraph, END } from '@langchain/langgraph';
import { createArmaloNode } from 'armalo-agent/langgraph';
import { ChatOpenAI } from '@langchain/openai';
interface ResearchState {
query: string;
sources: string[];
synthesis: string;
receipt?: RunReceipt;
}
const model = new ChatOpenAI({ model: 'gpt-4o' });
const armaModel = createArmaloNode(model, {
pact: RESEARCH_PACT,
agentId: 'langgraph-research-agent',
receiptStore: async (receipt) => {
await db.receipts.upsert({ runId: receipt.runId }, receipt);
},
});
async function fetchSourcesNode(state: ResearchState): Promise<Partial<ResearchState>> {
const result = await armaModel.invoke([
{ role: 'system', content: 'Retrieve 3-5 primary sources for the query. Return structured JSON.' },
{ role: 'user', content: state.query },
]);
return { sources: JSON.parse(result.content as string) };
}
async function synthesizeNode(state: ResearchState): Promise<Partial<ResearchState>> {
const result = await armaModel.invoke([
{ role: 'system', content: 'Synthesize the provided sources. Cite each claim with [Source: N].' },
{ role: 'user', content: `Sources:\n${state.sources.join('\n\n')}\n\nQuery: ${state.query}` },
]);
return { synthesis: result.content as string, receipt: armaModel.lastReceipt() };
}
const graph = new StateGraph<ResearchState>({
channels: { query: null, sources: null, synthesis: null, receipt: null },
})
.addNode('fetch', fetchSourcesNode)
.addNode('synthesize', synthesizeNode)
.addEdge('__start__', 'fetch')
.addEdge('fetch', 'synthesize')
.addEdge('synthesize', END)
.compile();
createArmaloNode returns an object with the same .invoke() and .stream() interface as a LangChain BaseLanguageModel, slotting into any existing graph node without type gymnastics.
3d. LangChain β ArmaloPactChain
import { ArmaloPactChain } from 'armalo-agent/langchain';
import { ChatAnthropic } from '@langchain/anthropic';
import { PromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
const model = new ChatAnthropic({ model: 'claude-opus-4-5' });
const prompt = PromptTemplate.fromTemplate('You are a customer support agent. User query: {query}');
const baseChain = prompt.pipe(model).pipe(new StringOutputParser());
const guardedChain = new ArmaloPactChain(baseChain, {
pact: CUSTOMER_SUPPORT_PACT,
agentId: 'support-chain-v3',
onViolation: async (v) => {
await alertingService.send({
level: v.enforcement === 'hard'? 'error' : 'warn',
message: `Pact violation in support chain: ${v.clauseId}`,
context: { runId: v.runId, args: v.action.args },
});
},
});
const result = await guardedChain.invoke({ query: 'What is my account balance?' });
const receipt = guardedChain.lastReceipt();
3e. TrustNativeAgent β building from scratch
For new agents where you control the architecture, TrustNativeAgent is the full-fidelity path. It handles the tool-call loop, pact enforcement, and receipt accumulation natively.
import { TrustNativeAgent, type AgentTool, type TrustNativeAgentConfig } from 'armalo-agent';
import { composePacts, SAFETY_DEFAULTS, CODING_PACT } from 'armalo-agent/pacts';
import OpenAI from 'openai';
const config: TrustNativeAgentConfig = {
agentId: 'coding-agent-prod',
model: 'gpt-4o',
client: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }),
pact: composePacts(SAFETY_DEFAULTS, CODING_PACT, {
workingDirectory: process.cwd(),
allowedCommands: ['npm test'],
}),
tools: [readFileTool, runTestsTool],
systemPrompt: 'You are a coding assistant. Work only within the designated directory.',
maxTurns: 20,
onViolation: (v) => logger.warn('pact-violation', v),
onRunComplete: async (receipt) => { await storeReceipt(receipt); },
};
const agent = new TrustNativeAgent(config);
const { output, receipt } = await agent.run({
input: 'Refactor the authentication module to use the shared token validator.',
});
Step 4: Work With Run Receipts
The full receipt structure
interface RunReceipt {
runId: string; // UUID, stable across exports
agentId: string;
pactId: string;
pactVersion: string;
pactHash: string; // SHA-256 of the serialized pact at execution time
startedAt: string; // ISO 8601
completedAt: string;
outcome: 'completed' | 'halted' | 'error';
actions: ActionRecord[]; // every tool call, in order
clauseEvaluations: ClauseEval[]; // per-action, per-clause evaluation results
violations: PactViolation[]; // subset where violation=true
signature: string; // HMAC-SHA256 for tamper evidence
parentRunId?: string; // set when this is a delegated sub-agent run
}
pactHash is the SHA-256 of the canonical JSON serialization of the pact at execution time. When you change a pact clause and re-deploy, old receipts carry a different pactHash β this lets you correlate behavior changes with pact version changes in your analytics.
clauseEvaluations contains one entry per (action, clause) pair where the clause's scope matched the action type. If your pact has 8 clauses and the agent made 12 tool calls, you will have up to 96 clause evaluation records.
signature provides tamper evidence β if the receipt is modified after generation, signature verification fails. It proves the receipt has not been altered since it was produced by your process; it does not prove the agent ran correctly.
Using receipts for debugging
Concrete scenario: your coding agent wrote to ../config/database.yaml β outside its designated working directory. Here's how to trace it:
const receipt = await db.receipts.findOne({ runId: 'run_01j...' });
// Step 1: find the offending action
const offendingAction = receipt.actions.find(
(a) => a.tool === 'write_file' && (a.args.path as string).includes('../config')
);
console.log('Turn index:', offendingAction?.turnIndex); // β 7
// Step 2: reconstruct the model's context at turn 7
const priorActions = receipt.actions.filter((a) => a.turnIndex < 7);
console.log('Prior calls:', priorActions.map((a) => `${a.turnIndex}:${a.tool}`));
// β ["0:read_file", "2:read_file", "4:list_directory", "6:read_file"]
// Step 3: check clause evaluations for that action
const evalsAtTurn7 = receipt.clauseEvaluations.filter((e) => e.actionIndex === 7);
console.log('Evaluations at turn 7:', evalsAtTurn7);
// β [{ clauseId: "filesystem-scope", violation: true, enforcement: "soft" }]
The receipt tells you: at turn 7, the filesystem-scope clause fired a soft violation. The fix is to change enforcement: "soft" to enforcement: "hard". Once hardened, the write will be blocked and the agent will either find the correct path or surface the limitation explicitly.
Export formats
import { exportReceipt } from 'armalo-agent/receipts';
// JSON β for storage, APIs, programmatic querying
const json = exportReceipt(receipt, 'json');
await s3.putObject({ Key: `receipts/${receipt.runId}.json`, Body: json });
// Markdown β for human review in postmortems and PR descriptions
const md = exportReceipt(receipt, 'markdown');
// HTML β for reports to compliance teams or customers
const html = exportReceipt(receipt, 'html');
Step 5: Configure MCP Trust-Gating
MCP trust-gating sits at the boundary between an agent and a tool server. The question OAuth answers: "is this agent authorized to make requests?" The question trust-score gating answers: "does this agent have a behavioral track record consistent with accessing this resource?"
An agent can be fully authorized β valid credentials β and still fail trust-gating because its run receipt history shows repeated pact violations.
import { createTrustGatedMCPServer } from 'armalo-agent/mcp';
const server = createTrustGatedMCPServer({
tools: [databaseQueryTool, fileSystemTool, apiClientTool],
trustGate: {
minimumScore: 750,
agentIdHeader: 'x-armalo-agent-id',
receiptHeader: 'x-armalo-receipt',
onRejection: (agentId, score, requiredScore) => {
auditLog.write({ event: 'trust_gate_rejection', agentId, score, requiredScore });
},
toolThresholds: {
'database_write': 850,
'database_read': 700,
'api_client': 600,
},
},
});
server.listen(3100);
Start with lower thresholds (400β500) during initial deployment to observe which agents fail gating without blocking all traffic. Raise thresholds incrementally after reviewing your agent population's score distribution.
Step 6: Multi-Agent Pact Composition
When Agent A delegates to Agent B, the system is trust-blind at the delegation boundary without receipt composition.
import { composeReceipts } from 'armalo-agent/receipts';
// Inside the orchestrator's delegate tool:
execute: async ({ query }) => {
const researcher = new TrustNativeAgent({
agentId: 'researcher-v2',
pact: RESEARCH_PACT,
parentRunId: orchestrator.currentRunId(), // links receipts in the chain
tools: researchTools,
});
const { output, receipt: subReceipt } = await researcher.run({ input: query });
orchestrator.registerSubReceipt(subReceipt);
return output;
},
// After the orchestrator run completes:
const chainReceipt = composeReceipts(parentReceipt);
// chainReceipt.subReceipts β all delegated receipts, nested
// chainReceipt.chainOutcome β "completed" only if ALL receipts completed without hard violations
// chainReceipt.chainViolations β all violations across the chain, by agentId and runId
A downstream trust-gated MCP server may require a chain receipt rather than just the calling agent's individual receipt β because if Agent A delegates to Agent B and Agent B violates its pact, Agent A's receipt looks clean. The chain receipt surfaces the full picture.
Step 7: Register for Live Trust Scoring
npm run register
This connects your agent to the Armalo trust graph. Every run receipt submitted from your agent is ingested and scored. The score reflects:
- Pact compliance rate over the trailing 30-day window
- Hard violation frequency (weighted more heavily than soft)
- Outcome distribution (completed vs. halted vs. error)
- Behavioral consistency across task types
On initialization: a newly registered agent starts with no score. MCP trust gates set to high thresholds (750+) will reject new agents until sufficient run history is accumulated. Plan for this: either use lower initial thresholds during an observation period, or run a scripted evaluation suite through AgentGauntlet before promoting to production traffic.
Production Considerations
Migrating from prompt-only. Audit your current system prompt for constraint language β "never write outside your working directory," "always cite sources." Each constraint is a candidate for a pact clause. Start every new clause with enforcement: "soft". Review violation logs after 48β72 hours. Harden to enforcement: "hard" once you have evidence the clause fires correctly. Remove the corresponding text from the system prompt only after the pact clause is hardened.
AgentGauntlet as a deployment gate. Define custom evaluation cases that simulate adversarial inputs probing your pact clause boundaries. Wire gauntlet.run() into your CI pipeline before any production deployment. A pact change that breaks an adversarial probe blocks the deploy automatically.
Troubleshooting
Hard violation fires but agent retries the same action. Pass a custom rejectionMessage on the clause with specific instructions β a generic block message causes the model to probe for workarounds; a specific one tells it what to do instead.
Receipt pactHash changes between deployments despite no pact changes. Matcher functions are serialized by .toString(). Whitespace changes from an auto-formatter will change the hash. Pin pact definitions to files excluded from auto-formatting.
Soft violations in receipt but no onViolation callback fired. Callbacks registered after the run has started are not applied to the current run. Register callbacks before the first run() call.
Trust gate rejects agent with score above the minimum. Per-tool thresholds in toolThresholds override minimumScore. Check whether the specific tool has a higher threshold configured.
lastReceipt() returns undefined. Always await the model call before accessing the receipt β lastReceipt() is populated synchronously after onRunComplete fires.
Multi-agent chain receipt missing a sub-agent's receipt. registerSubReceipt must be called after awaiting the sub-agent's run. On error, register the partial receipt in the catch block before re-throwing.
Further Reading
The MCP Trust Shield Readiness Checklist
A 21-point checklist for hardening any MCP server before agents touch it: trust gating, rate limits, audit log, prompt-injection defense.
- Trust-score gate per tool call: when to allow, deny, or escalate
- Per-tool rate limit + cost-budget defaults that survive a prompt-injection storm
- Audit-log schema that survives both internal and external review
- Drop-in `npx armalo-mcp-shield` config recipe for any MCP server
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦