How to Write a Behavioral Pact
A behavioral pact is not a terms-of-service document or a capability description. It is a machine-readable specification of what an agent will and will not do β the operational contract that makes deployment accountable. Here is how to write one that actually works.
Continue the reading path
Topic hub
Behavioral ContractsThis page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.
Next Read
Behavioral Pacts: The Legal Contract Layer the Agent Economy Is Missing
Contracts govern every consequential economic relationship. The agent economy is conducting consequential economic relationships without contracts. Behavioral pacts are the missing primitive β and formalizing what an agent will and will not do before deployment changes the enterprise risk calculus entirely.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Start With What Goes Wrong
The best behavioral pacts are written backward β starting from the failure modes you are trying to prevent, working back to the constraints that would catch them.
Before writing a single line of pact specification, answer three questions:
- What is the worst thing this agent could do in this deployment context, if it behaved unexpectedly?
- What are the ambiguous situations where the agent might reasonably make different choices, and what choice do you want it to make?
- What conditions require a human in the loop, even if the agent is technically capable of acting autonomously?
The answers to these questions are your pact's load-bearing structure. Everything else is supporting material.
If you start with "what should the agent do?" you will write a capability description. That is not a behavioral pact. A behavioral pact is defined by its constraints, not its capabilities. The distinction is not semantic β it is structural. Constraints are enforceable and auditable. Capability descriptions are aspirational.
Component 1: Authorized Action Scope
The authorized scope specification tells the agent β and any external auditor β exactly what actions are authorized, in what contexts, and with what constraints.
See your own agent measured against this trust model. $10 to start β $5 in platform credits and a $2.50 bond seed go straight into your account.
Score my agent β $10 βThe most common mistake in scope specification is insufficient specificity. "May handle customer inquiries" is not a scope specification. It is a category label. A scope specification must be specific enough that a third party can determine, for any given action, whether it falls within or outside the scope.
A weak scope statement:
The agent may assist customers with their accounts.
A strong scope statement:
The agent may: (1) retrieve and display account balance, transaction history, and contact information for the authenticated customer's account; (2) answer questions about product features using only information from the approved product knowledge base; (3) submit refund requests up to ${{REFUND_THRESHOLD}} on behalf of the authenticated customer for orders placed within the last 90 days. The agent may not: modify account credentials, process payments, access records of accounts other than the authenticated customer's, or make representations about future product features or pricing not in the approved knowledge base.
Notice several properties of the strong statement:
- Actions are specific, not categorical
- Constraints are explicit, not implied
- Data access boundaries are defined
- Parameterization (
{{REFUND_THRESHOLD}}) allows context-specific values without a new pact for each deployment
Component 2: Hard Prohibitions
Hard prohibitions are the bright lines β actions the agent will never take regardless of instruction, context, or apparent authorization. The key property of a hard prohibition is that it must be unconditional. "Will not access payment data unless instructed by a supervisor" is not a hard prohibition. It is a conditional. Hard prohibitions have no conditions.
For a customer service agent, hard prohibitions typically include:
- Accessing records outside the authenticated session
- Executing financial transactions above the authorized threshold without explicit human confirmation
- Sharing customer data with any external party not specified in the data access policy
- Modifying data records without explicit customer authorization
For a code generation agent, hard prohibitions typically include:
- Generating code that accesses filesystem paths outside the project scope
- Executing shell commands that were not explicitly authorized in the pact
- Accessing external APIs not in the approved list
- Generating code that stores or transmits authentication credentials
The test for a well-formed hard prohibition: if someone explicitly told the agent to do this thing, would the prohibition still hold? If yes, it is a hard prohibition. If the agent might comply under the right framing, it is a policy preference, not a prohibition.
Component 3: Escalation Triggers
Escalation triggers define when the agent must pause and wait for human authorization before proceeding. They are the mechanism for managing the boundary between autonomous and supervised operation.
Escalation triggers have two components: the triggering condition and the escalation action.
Triggering conditions should be specific and objective where possible:
- Confidence below a defined threshold (e.g., uncertainty score > 0.7)
- Request falls outside the categorized scope (no authorized action category matches)
- Transaction value exceeds the autonomous authorization limit
- Customer sentiment indicates distress or complaint intent (for customer service contexts)
- Request involves data outside the authorized access scope
Escalation actions define what happens when a trigger fires:
- Notify the specified supervisor channel
- Suspend task processing until authorization is received
- Record the pending escalation in the audit trail
- Provide the customer with an expected response time
The calibration of escalation triggers is an engineering problem, not a policy problem. Triggers set too broadly produce unsustainable escalation volume. Triggers set too narrowly miss the cases that actually need human review. The right calibration depends on your deployment context β the cost of over-escalation versus the cost of under-escalation in your specific domain.
For a first deployment, err toward broader triggers. The operational data from the first 30 days will tell you where to narrow.
Component 4: Evidence Obligations
Evidence obligations specify what the agent must record for every consequential action, in a format that supports auditability. The evidence obligation is what transforms a behavioral pact from a statement of intent into a verifiable commitment.
A minimal evidence record for each agent action should include:
- The input that prompted the action (user message, system event, or tool output)
- The action taken
- The pact clause that authorized the action
- The timestamp
- A hash of the record, signed by the agent's key
The signature requirement is important. An unsigned log can be modified after the fact. A signed, append-only log with cryptographic integrity is much harder to falsify and creates genuine accountability.
For high-stakes deployments, evidence obligations should include the agent's confidence level for each action and the alternatives considered. This makes it possible to audit not just what the agent did, but whether it was appropriate for it to have acted autonomously rather than escalating.
Component 5: Consequence Framework
A pact without consequences is a suggestion. The consequence framework defines what happens when a pact violation is detected.
Consequences should be tiered by severity:
Monitoring only: For borderline cases β actions that are technically within scope but pattern-match concern indicators. Record for review, do not interrupt operation.
Escalation: For actions that trigger an escalation condition. Suspend autonomous operation on this task, notify the designated reviewer, and resume only with authorization.
Suspension: For confirmed pact violations. Suspend the agent from autonomous operation, trigger a root cause review, and require explicit reauthorization before resuming.
Bond forfeiture: For systematic pact violations that demonstrate the agent's behavioral commitments were not genuine. Where agents have posted economic bonds against their behavioral claims, systematic violation triggers bond forfeiture as an economic consequence.
The economic consequence tier is the mechanism that makes trust credible rather than aspirational. An agent whose vendor is willing to stake economic value on its behavioral reliability is making a different kind of commitment than one that offers only contractual terms.
Parameterization: Making Pacts Reusable
One of the most important design decisions in pact authorship is where to use concrete values versus parameters. A pact that specifies a concrete refund threshold of $500 is not reusable across deployment contexts with different operational requirements. A pact that parameterizes the threshold β {{REFUND_THRESHOLD}} β can be instantiated with different values for different deployments.
The principle: pact templates capture the structure of behavioral commitments; parameters capture the deployment-specific values. The template is the intellectual property β the careful thinking about what constraints matter and how they should interact. The parameters are the operational configuration.
This also applies to escalation conditions. A pact template might specify "escalate when confidence < {{UNCERTAINTY_ESCALATION_THRESHOLD}}", with the threshold set to 0.7 for high-stakes deployments and 0.4 for lower-stakes contexts. The structure is the same; the calibration reflects the operational context.
Versioning and Amendment
Behavioral pacts should be versioned. The first version reflects your best pre-deployment understanding of what constraints are needed. The second version reflects what you learned from the first 90 days. The evolution of a pact over time is itself informative β it tells you which constraints were too narrow, which were too broad, and what failure modes emerged that you had not anticipated.
When a pact is amended, the amendment should be explicit about what changed and why. A pact that silently removes a hard prohibition is not a minor update β it is a significant change to the agent's behavioral contract. Version control with explicit change logs creates accountability for pact evolution.
The Meta-Principle
A well-written behavioral pact should answer the following question unambiguously, for any given agent action: "Was the agent authorized to take this action?"
If you cannot answer that question by reading the pact, the pact is not complete. The completeness test is not whether the pact covers every possible action β it cannot. The test is whether the pact establishes a clear framework for determining what is and is not authorized, such that any action can be evaluated against that framework.
Writing a behavioral pact is harder than writing a capability description. It requires thinking carefully about failure modes, making explicit commitments that can be audited, and accepting the accountability that comes with specificity. That difficulty is the point. The discipline of writing the pact is where the deployment gets safer.
The Trust Score Readiness Checklist
A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.
- 12-dimension scoring readiness β what you need before evals run
- Common reasons agents score under 70 (and how to fix them)
- A reusable pact template you can fork
- Pre-launch audit sheet you can hand to your security team
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦