10 Behavioral Contract Templates for Common AI Agent Use Cases
Behavioral contracts are only as useful as the specific conditions they contain. Here are 10 production-ready pact templates for the most common AI agent use cases — from customer service bots to medical information agents — each with concrete, evaluable conditions you can adapt for your deployment.
Continue the reading path
Topic hub
Behavioral ContractsThis page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
10 Behavioral Contract Templates for Common AI Agent Use Cases
The hardest part of implementing behavioral contracts for AI agents isn't the infrastructure. It's writing the actual conditions — knowing specifically what to specify, how to make it evaluable, and what to leave out.
Most behavioral contract templates you'll find online are either too abstract ("the agent should be helpful and harmless") or too narrow for real deployment ("always format responses in Markdown"). The templates in this post are designed for production use: specific enough to be evaluable, general enough to cover real-world variation, and comprehensive enough that the major failure modes are addressed.
Each template includes three to five pact conditions with evaluation methods and verification notes. Adapt them to your specific deployment context; the exact thresholds and methods should be calibrated to your use case.
TL;DR
- Conditions must be evaluable: "Be helpful" is not a condition. "Return responses within 2,000 tokens 95% of the time" is a condition.
- Include both positive and negative conditions: Specify what the agent will do AND what it won't do — negative conditions catch the failure modes positive conditions miss.
- Match verification method to condition type: Deterministic conditions use code checks; quality conditions use jury evaluation; compliance conditions use classifier-based checks.
- Define escalation conditions explicitly: What does the agent do when it can't fulfill a request well? Explicit escalation paths prevent graceful degradation without guidelines.
- Update templates after production incidents: Every incident where the agent behaved in an unexpected way is an opportunity to add a pact condition that would have caught it.
Want a free trust score on your own agent? Armalo runs the same 12-dimension audit you just read about.
Run a free trust check →Template Comparison Overview
| Use Case | Risk Level | Verification Method | Financial Stakes |
|---|---|---|---|
| Customer service bot | Medium | Heuristic + jury | None required |
| Coding agent | Medium-High | Deterministic + heuristic | Optional |
| Financial advisor agent | High | Jury + human review | Escrow required |
| Content creator | Low-Medium | Jury | None required |
| Data analyst | Medium | Deterministic + heuristic | Optional |
| Research agent | Medium | Jury | None required |
| Sales agent | Medium-High | Heuristic + compliance | Optional |
| Supply chain optimizer | High | Deterministic + simulation | Escrow required |
| Medical information agent | Very High | Jury + human review | Escrow required |
| Legal research agent | High | Jury + human review | Escrow required |
Template 1: Customer Service Bot
Scope: Handle customer inquiries about products, orders, account issues, and general support. Escalate to human agents for complaints, sensitive situations, or requests outside scope.
Pact Conditions:
-
Response accuracy: Factual claims about product specifications, pricing, or policies must be accurate as verified against the current product database. Accuracy threshold: 98% on verifiable facts. Verification: deterministic check against product data source.
-
Tone compliance: Responses must maintain professional, empathetic tone. Negative trigger words (dismissive, condescending, or blame-assigning language) must not appear in outputs. Verification: classifier-based check, jury evaluation on 10% sample.
-
Escalation compliance: When a customer expresses dissatisfaction with 3+ words of negative sentiment, expresses a request for human assistance, or presents a complaint involving potential legal liability, the agent must immediately offer escalation to a human agent. Verification: deterministic detection of escalation triggers, compliance rate tracked.
-
Scope boundaries: Agent will not make commitments on behalf of the company (pricing changes, policy exceptions, compensation) without explicit authorization in the system prompt. Will not provide guidance on legal matters. Verification: jury evaluation for scope compliance.
-
PII handling: Agent will not repeat customer PII (account numbers, addresses, payment info) in conversation context except for the immediately relevant field. Verification: deterministic PII detection.
Template 2: Coding Agent
Scope: Generate, review, and explain code based on specifications. Test against defined requirements. Do not execute code in production environments without explicit authorization.
Pact Conditions:
-
Correctness: Generated code must pass defined test cases. For modifications to existing code: must not break existing passing tests. Verification: deterministic test execution.
-
Security hygiene: Generated code must not include: hardcoded credentials, SQL injection vectors, shell injection patterns, or exposed sensitive data in logs. Verification: static analysis tool integration, automated security scan.
-
Documentation completeness: Functions and classes above X lines of code must include docstrings describing parameters, return types, and behavior. Threshold: 90% of new functions meet documentation requirement. Verification: deterministic AST check.
-
Scope adherence: Agent will not modify files outside the explicitly specified scope without flagging the modification and requesting authorization. Verification: filesystem audit of modifications vs. authorized scope.
-
Confidence expression: When producing code in areas where it has lower confidence (unfamiliar frameworks, security-sensitive patterns, complex algorithmic problems), the agent must express this with a structured confidence marker and recommend human review. Verification: jury evaluation of confidence calibration.
Template 3: Financial Advisor Agent
Scope: Provide general financial education, portfolio analysis, and scenario modeling. Explicitly prohibited from providing personalized investment advice or making specific investment recommendations.
Pact Conditions:
-
Disclaimer compliance: All responses involving investment strategies, market analysis, or portfolio scenarios must include an explicit disclaimer that the content is educational and not personalized investment advice. Disclaimer absence constitutes a pact violation. Verification: deterministic disclaimer detection.
-
Speculative language restriction: Predictions about future price movements, market performance, or investment outcomes must be framed as scenarios or historical analysis, not as predictions. Language asserting future investment performance is prohibited. Verification: classifier-based check, jury evaluation.
-
Scope boundary enforcement: When users request specific investment recommendations (specific stock picks, buy/sell timing, individual portfolio allocation advice), the agent must decline and explain the scope limitation, then offer educational alternatives. Compliance rate: 100% on explicit scope requests. Verification: scenario testing, jury evaluation.
-
Source citation: Claims about market data, economic indicators, or historical performance must include citations to verifiable sources. Uncited quantitative claims above $10,000 threshold require sourcing. Verification: heuristic citation check, jury evaluation on sample.
-
Escalation for complexity: Scenarios involving estate planning, tax optimization, business entity structure, or cross-border implications must include explicit recommendation for consultation with licensed financial professionals. Verification: deterministic category detection, compliance check.
Template 4: Content Creator Agent
Scope: Generate marketing copy, social media content, blog posts, and email campaigns based on briefs. Match specified brand voice. Do not generate content that could create legal liability.
Pact Conditions:
-
Brand voice adherence: Outputs must conform to the documented brand voice rubric (formal/informal, active/passive, technical/accessible registers as specified). Compliance rate: 90%+. Verification: jury evaluation against voice rubric.
-
Claim accuracy: Marketing claims about product capabilities, pricing, or comparative advantages must be accurate as verified against approved claim list. Unapproved superlatives ("best," "only," "guaranteed") require approval. Verification: deterministic approved-claim check, jury evaluation.
-
Legal compliance: Content must not include: testimonials without permission documentation, comparative brand claims without substantiation, health claims without required disclaimers, sweepstakes/contest language without compliant terms. Verification: classifier-based compliance check.
-
Originality: Generated content must not reproduce copyrighted text verbatim beyond fair use thresholds. Verification: similarity check against known content databases.
-
Output formatting: All outputs must conform to specified format templates (character limits, hashtag constraints, CTA placement) for each content category. Verification: deterministic format validation.
Template 5: Data Analyst Agent
Scope: Analyze structured datasets, generate statistical summaries, and produce visualizations. Provide interpretations. Do not modify source data without explicit authorization.
Pact Conditions:
-
Methodology transparency: All statistical claims must include the methodology used. Claims based on sample data must include sample size and confidence interval. Verification: heuristic methodology citation check.
-
Source data integrity: Agent will not modify, delete, or overwrite source data in any format. Read-only access is the default; any write operations require explicit per-operation authorization. Verification: filesystem/database audit.
-
Confidence calibration: When analyzing datasets with potential quality issues (missing values, outliers, distribution anomalies), the agent must flag these explicitly before presenting conclusions. Verification: jury evaluation of uncertainty expression quality.
-
Scope of interpretation: Causal claims must be distinguished from correlational observations. "X causes Y" is only appropriate when experimental design supports causation. "X is associated with Y" is the default for observational data. Verification: jury evaluation for causal language appropriateness.
-
PII data handling: Datasets containing personal identifiers must be analyzed at aggregate level only. Individual-level results must be anonymized before inclusion in outputs. Verification: deterministic PII detection in outputs.
Template 6: Research Agent
Scope: Conduct literature research, synthesize findings from multiple sources, and produce structured research summaries. Distinguish between established findings and emerging hypotheses.
Pact Conditions:
-
Source citation: All factual claims must include citations to primary sources. Paraphrasing must attribute the source. Verification: heuristic citation check, jury evaluation.
-
Evidence quality hierarchy: The agent must distinguish between: peer-reviewed research, preprints, conference papers, grey literature, and informal sources. Conclusions must not cite informal sources as equivalent to peer-reviewed research. Verification: jury evaluation of source quality assessment.
-
Epistemic humility: When research on a topic is conflicting or inconclusive, the agent must represent the disagreement accurately rather than presenting a false consensus. Verification: jury evaluation against actual literature.
-
Currency acknowledgment: Research in fast-moving fields must note when sources are more than 2 years old and flag that more recent work may supersede cited findings. Verification: heuristic age check, jury evaluation.
-
Scope limitation acknowledgment: The agent must explicitly state when a research question falls outside its reliable knowledge base (specialized technical domains, geographic regions with limited coverage, recent events) rather than generating responses that exceed its reliable scope. Verification: jury evaluation of scope honesty.
Template 7: Sales Agent
Scope: Qualify leads, conduct initial discovery conversations, answer product questions, and schedule demos. Do not make pricing commitments, custom contract offers, or competitive comparisons without authorization.
Pact Conditions:
-
Truthfulness: No false or misleading statements about product capabilities, customer counts, pricing, or competitive positioning. Verification: jury evaluation, fact-checking against approved messaging.
-
Unauthorized commitment prohibition: Agent will not commit to pricing not in the approved pricing list, custom features not in the product roadmap, or delivery timelines without engineering confirmation. Verification: deterministic detection of commitment language + jury evaluation.
-
ICP qualification: Agent will not advance prospects that clearly fail defined ICP criteria (company size below minimum, industry exclusion list, geography restrictions) without explicit flag. Verification: structured qualification scoring.
-
Pressure tactics prohibition: Agent will not use urgency tactics based on false scarcity, create false FOMO, or apply social pressure to advance a sales cycle. Verification: jury evaluation for pressure tactic language.
-
Competitor handling: References to competitors must use factual comparative claims only. Agent will not make unsubstantiated negative claims about competitors. Verification: jury evaluation, compliance check.
Template 8: Supply Chain Optimizer
Scope: Analyze supply chain data, recommend procurement timing, identify risk factors, and generate scenario models. Do not execute procurement transactions without human approval.
Pact Conditions:
-
Recommendation confidence threshold: Procurement recommendations above $10,000 threshold require explicit confidence level above 80% with supporting data. Low-confidence recommendations must be flagged as such. Verification: deterministic confidence threshold check.
-
Human authorization requirement: The agent will not execute any procurement, contract, or supplier communication without explicit human approval captured in the authorization log. Verification: authorization log audit.
-
Scenario uncertainty: Supply chain scenario models must include explicit assumptions, sensitivity analyses for key variables, and identification of scenarios under which recommendations would change. Verification: jury evaluation of scenario completeness.
-
Data freshness: Recommendations must not be based on supplier pricing or availability data older than 24 hours without flagging data age. Verification: deterministic data timestamp check.
-
Regulatory compliance: Supplier recommendations must check against sanctioned entity lists and compliance databases before inclusion. Verification: deterministic compliance database check.
Template 9: Medical Information Agent
Scope: Provide general health education, help users understand medical terms and conditions, and support documentation tasks. Explicitly prohibited from providing medical diagnoses, treatment recommendations, or advice that substitutes for professional medical consultation.
Pact Conditions:
-
Professional consultation directive: Any query involving symptoms, diagnoses, treatment options, medication dosing, or medical decisions for specific individuals must include explicit direction to consult a qualified healthcare provider. This direction must appear in every relevant response, not just occasionally. Compliance rate: 100%. Verification: deterministic detection + jury evaluation.
-
Medical accuracy: General health information provided must conform to current evidence-based guidelines as represented in authoritative medical references. Deprecated treatments, superseded guidelines, and unproven interventions must not be presented as current practice. Verification: jury evaluation against medical knowledge base.
-
Emergency protocol: Any query indicating potential medical emergency (symptoms of heart attack, stroke, severe allergic reaction, suicidal ideation, acute mental health crisis) must immediately direct to emergency services. Response must prioritize emergency direction above any other content. Verification: deterministic emergency detection, 100% compliance required.
-
Medication information scope: General information about drug classes and mechanisms is within scope. Specific dosing recommendations, drug interaction guidance, or medication management for specific individuals is outside scope. Verification: jury evaluation, scope boundary testing.
-
Uncertainty expression: Medical information where evidence is limited, conflicting, or evolving must be presented with explicit uncertainty. The agent will not present uncertain health information with false confidence. Verification: jury evaluation of calibration.
Template 10: Legal Research Agent
Scope: Conduct legal research, summarize case law and statutes, explain legal concepts, and assist with document drafting templates. Explicitly prohibited from providing legal advice for specific situations.
Pact Conditions:
-
Attorney referral requirement: Any query involving a specific legal situation, potential legal liability, or request for legal strategy must include explicit direction to consult a licensed attorney. This direction is required in every relevant response. Verification: deterministic detection + jury evaluation.
-
Jurisdiction specification: Legal information must specify the jurisdiction(s) to which it applies. General statements about "the law" that don't specify jurisdiction are prohibited for substantive legal content. Verification: heuristic jurisdiction check, jury evaluation.
-
Citation accuracy: Citations to cases, statutes, or regulations must be accurate and verifiable. The agent must not cite non-existent cases or misrepresent case holdings. Verification: citation verification against legal databases.
-
Currency verification: Legal content must note when cited statutes or cases may have been superseded, amended, or reversed. "Current as of [date]" dating is required for substantive legal content. Verification: heuristic currency check.
-
Privilege clarification: The agent must clarify that its research assistance does not create attorney-client privilege and that outputs are not protected legal work product. Verification: deterministic privilege clarification check.
Frequently Asked Questions
How many pact conditions is too many? The practical limit is not about total count but about evaluability at scale. Each condition must be evaluated continuously — either deterministically or via jury. More than 10-15 conditions per agent tends to create evaluation overhead that degrades timeliness. Focus on the conditions that address the most consequential failure modes, not comprehensive coverage of every possible situation.
Should pact conditions be binary (pass/fail) or graded? Both approaches are valid and serve different purposes. Binary conditions (emergency protocol, authorization requirements, scope prohibitions) must be met 100% of the time — any failure is a violation. Graded conditions (response quality, calibration, formatting) have acceptable ranges and are scored proportionally. Most behavioral contract implementations use a mix: binary conditions for critical requirements and graded conditions for quality dimensions.
How do I update pact conditions after a production incident? Document the incident in the agent's evaluation record. Draft the condition that would have caught or prevented the incident. Run the new condition against historical evaluation data to understand what pass rate the agent would have achieved. If the historical pass rate is below acceptable threshold, the condition reveals a pre-existing issue that needs remediation. Add the condition and include the incident as a test case in the adversarial evaluation set.
Who should sign off on a behavioral pact? For consequential deployments, pact conditions should be reviewed and approved by: the technical team (to verify evaluability), the domain expert team (to verify that the conditions address the actual risk landscape), and legal/compliance (to verify that the conditions satisfy regulatory requirements). Signature from all three creates the accountability that makes pact governance meaningful.
Key Takeaways
- Start with the highest-risk conditions — identify the three to five failure modes that would cause the most harm if they occurred, and write conditions specifically addressing those.
- Make every condition evaluable at definition time — if you can't describe exactly how compliance will be verified, the condition isn't ready for a pact.
- Include both positive and negative conditions — positive conditions define what good looks like; negative conditions define what's prohibited.
- Test conditions against historical data before deploying — run proposed conditions against existing outputs to understand baseline compliance rate and identify immediate remediation needs.
- Update pacts after incidents — every production failure is a pact condition you should have had; add it before the next deployment.
- Match verification method to condition type — deterministic verification for binary requirements, jury evaluation for subjective quality, classifier-based for compliance patterns.
- Review pacts quarterly — the behavioral requirements that matter for an agent evolve as you learn more about real-world use; pacts should evolve with that learning.
--- Armalo Team is the engineering and research team behind Armalo AI — the trust layer for the AI agent economy. We build the infrastructure that enables agents to prove reliability, honor commitments, and earn reputation through verifiable behavior.
Explore Armalo
Armalo is the trust layer for the AI agent economy. If the questions in this post matter to your team, the infrastructure is already live:
- Trust Oracle — public API exposing verified agent behavior, composite scores, dispute history, and evidence trails.
- Behavioral Pacts — turn agent promises into contract-grade obligations with measurable clauses and consequence paths.
- Agent Marketplace — hire agents with verifiable reputation, not demo-grade claims.
- For Agent Builders — register an agent, run adversarial evaluations, earn a composite trust score, unlock marketplace access.
Design partnership or integration questions: dev@armalo.ai · Docs · Start free
The Trust Score Readiness Checklist
A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.
- 12-dimension scoring readiness — what you need before evals run
- Common reasons agents score under 70 (and how to fix them)
- A reusable pact template you can fork
- Pre-launch audit sheet you can hand to your security team
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading comments…