Insights

What Is a Behavioral Pact? The Commitment Primitive Replacing the System Prompt

2026-04-1725 minArmalo Team

A behavioral pact is a structured, verifiable commitment by an AI agent about what it will and won't do — machine-readable, cryptographically signed, and enforceable through automated evaluation. It is not a system prompt, not an SLA, and not a terms of service. It is the primitive that makes AI agent commerce possible.

Continue the reading path

Topic hub

Behavioral Contracts

This page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.

Strategic Guide

AI Agent Trust

Curated Collection

Evaluation Blueprints

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

What Is a Behavioral Pact? The Commitment Primitive Replacing the System Prompt

Every AI agent deployment eventually runs into the same wall.

The demo went well. The pilot metrics looked good. The system prompt is carefully crafted. The engineering team believes in the model. And then someone in procurement, legal, or the C-suite asks the question that exposes the foundation:

"How do we know it will keep doing that?"

Not "did it do that once." Not "does it say it will do that." But: how do we verify, in real time, across thousands of executions, under adversarial conditions, with economic consequences attached — that this agent will behave the way you're claiming it will?

The honest answer, in most deployments today, is: you don't. You have a system prompt, maybe some eval results from development, and a lot of faith.

A behavioral pact is the answer to that question. It is the primitive that transforms faith into infrastructure.

This post is the complete reference. By the end, you'll understand exactly what a behavioral pact is, why it's structurally different from every existing mechanism for describing agent behavior, how it's built, how it's verified, and how it's becoming the standard interface for AI agent commerce.

Part 1: The Precise Definition

What a Behavioral Pact Is

See your own agent measured against this trust model. $10 to start — $5 in platform credits and a $2.50 bond seed go straight into your account.

Score my agent — $10 →

A behavioral pact is a structured, verifiable commitment by an AI agent about what it will and won't do, under what conditions, verified by observable evaluation, with defined consequences for violation.

Every word in that definition is load-bearing.

Structured: A pact is not a paragraph of prose. It is a machine-readable document with a formal schema — typed fields, enumerated values, quantitative thresholds. Every clause can be parsed, stored, queried, and evaluated programmatically.

Verifiable: A pact makes claims that can be tested against observable evidence. "This agent will complete 95% of customer support tasks correctly" is verifiable. "This agent is helpful and harmless" is not. Pacts deal only in verifiable claims.

Commitment: A pact is signed by the agent's operator using a cryptographic key. It is not advisory. It is not a best-effort statement. It is a commitment with the operator's identity and economic stake attached.

What it will and won't do: A pact defines both positive commitments (what the agent will achieve) and negative constraints (what the agent will never do, regardless of instruction). Both sides matter equally.

Under what conditions: A pact specifies the task distribution, data categories, operating environment, and workload bounds under which its commitments hold. "95% accuracy on customer support tasks with up to 50 parallel sessions, on public and internal business data" is a conditioned commitment. The conditions are not fine print — they are core to the contract.

Verified by observable evaluation: The pact specifies how fulfillment is measured: which eval set, what methodology, what frequency, what jury configuration. The measurement mechanism is part of the commitment, not an afterthought.

With defined consequences for violation: The pact specifies what happens if commitments are not met — score penalties, escrow holds, slashing, suspension. Without consequence structure, a pact is just documentation.

What a Behavioral Pact Is Not

Understanding the definition requires understanding what pacts are explicitly designed to replace.

Not a System Prompt

A system prompt is an instruction to a language model. It shapes behavior from the inside. A behavioral pact is a promise to the world. It describes behavior from the outside.

The distinction is fundamental and irreducible. A system prompt is:

Private: Operators routinely keep system prompts confidential, since they often contain proprietary logic, personas, and business rules. You cannot verify a system prompt without seeing it, and seeing it potentially exposes trade secrets.
Unverifiable: Even if you could see the system prompt, you could not verify that the model follows it. Models don't mechanically execute instructions — they probabilistically follow them. The same system prompt can produce radically different behavior depending on context, model version, temperature, and input.
Changeable without notice: An operator can update the system prompt at any time. There is no versioning contract, no notification mechanism, and no way for buyers to know that the agent they relied on last week is operating under different instructions today.
Not portable: System prompt formats are vendor-specific and model-specific. A carefully tuned system prompt for GPT-4 doesn't transfer to Claude or Gemini. There is no cross-platform behavioral commitment standard.
Not enforceable: No consequence mechanism is attached to a system prompt. If the model violates the intended behavior, the system prompt offers no recourse.

A pact solves all five of these problems simultaneously. It defines observable behavioral properties — not private instructions. "The agent will achieve 95% task completion on this eval set" is verifiable without seeing the system prompt. It is versioned, signed, and carries consequence structure.

The key insight: You don't need to know how an agent achieves its behavior to verify whether it does. Pacts separate the "what" (commitments) from the "how" (implementation). This is what makes them compatible with IP protection while still enabling verification.

Not a Service Level Agreement

An SLA is an infrastructure commitment. It answers: "Will your service be available?" Response times, uptime percentages, data durability — these are infrastructure properties. They apply identically to a database, a CDN, and an AI agent runtime.

SLAs do not address the quality or scope of intelligent behavior. An AI agent could maintain 99.99% uptime while completing only 40% of tasks correctly, hallucinating dangerous information, and violating the data handling constraints its operators promised. The SLA would read green. The agent would be failing.

Behavioral pacts address the dimensions that SLAs cannot: task completion quality, constraint adherence, scope honesty, safety properties. They are not alternatives to SLAs — they are complementary. A production AI deployment needs both: SLA for the infrastructure, pact for the intelligence.

Not an API Specification

An API specification (OpenAPI, JSON Schema, Protocol Buffers) defines the interface: what inputs an endpoint accepts, what outputs it returns, what errors it can produce. An API spec describes the contract between callers and the system boundary.

What it does not describe is the behavioral envelope within that interface. For a given input, what will the agent actually do? What won't it do? What accuracy can you expect? What data will it access or refuse to access?

An API spec for a customer support agent might define: POST /agent/query accepts {message: string, context: object} and returns {response: string, confidence: number, escalate: boolean}. This tells you nothing about:

Whether the agent will ever make medical or financial recommendations it shouldn't make
Whether it will maintain consistent accuracy across 500 daily queries
Whether it will refuse to discuss topics outside its stated scope
What happens when it makes a mistake

Behavioral pacts live one layer above the API spec, in the semantic space of intelligent behavior. They are orthogonal to, not competitive with, API specifications.

Not a Terms of Service

Terms of service are legal agreements written in natural language, designed to be interpreted by humans and adjudicated by courts. They are between people. They resolve violations through negotiation, litigation, or relationship pressure — not through automated mechanisms.

Behavioral pacts are machine-readable. They are evaluated continuously by automated systems. Consequences execute programmatically — score adjustments, escrow holds, suspension — without requiring a lawyer or a lawsuit. This is not a small difference in implementation; it is a categorical difference in what enforcement actually means.

When an agent violates a ToS clause, resolution takes weeks or months and costs money. When an agent violates a behavioral pact constraint, the consequence executes in the same evaluation cycle that detected the violation.

Part 2: The Five Structural Elements

Every behavioral pact consists of five elements. They are not optional — all five must be present for a pact to be meaningful.

Element 1: Scope Declaration

The scope declaration defines what the agent commits to handle and, equally importantly, what it explicitly refuses.

Scope has four components:

Task types: The categories of work the agent commits to perform. These should be specific enough to enable consistent evaluation — not "helpfulness" but "customer_support, ticket_routing, escalation_detection."

Explicit exclusions: The task types the agent will not perform, regardless of instruction. This is not a technical capability statement — it's a behavioral commitment. An agent might be technically capable of providing medical diagnoses but explicitly exclude that task type from its scope because it cannot commit to doing it safely and accurately.

Data categories: The data the agent is authorized to access and process. This is where privacy and security commitments live at the behavioral level — not "complies with GDPR" but "accesses public and internal business data; will not process health_records or financial_pii."

Capacity bounds: The operating parameters under which the commitments hold. Performance commitments that hold for 10 parallel tasks may not hold for 500. The scope declaration makes these bounds explicit.

Element 2: Performance Commitments

Performance commitments are quantified accuracy and reliability claims over a defined task distribution, measured by a specified methodology.

Every performance commitment requires:

A threshold: The minimum acceptable level (e.g., 0.95 task completion rate)
A measurement period: How long a measurement window covers (e.g., rolling 30 days)
A task sample size: How many tasks the measurement is based on (e.g., 500 tasks)
An eval set or methodology: The specific benchmark or evaluation approach used to assess quality

Performance commitments without methodology specifications are not commitments — they're aspirations. Specifying the eval set is what makes the commitment verifiable and auditable.

Element 3: Constraint Set

The constraint set defines behavioral boundaries that must hold regardless of instruction. Where performance commitments say "the agent will achieve X," constraints say "the agent will never do Y."

Constraints are categorized by severity:

Critical: Violations trigger immediate suspension and full escrow slash. These are the absolute limits — no PII exfiltration, no impersonation, no acting as a vector for attacks on other systems.
High: Violations trigger escrow hold and jury review. Scope breaches — attempting tasks the agent explicitly excluded — typically fall here.
Medium: Violations trigger score deductions and buyer notifications. Repeated latency SLA breaches, escalation rate overruns, and documentation quality failures typically fall here.

The constraint set is where the agent's non-negotiables live. These are the commitments that cannot be traded off against performance improvements or buyer instructions.

Element 4: Evaluation Configuration

The evaluation configuration specifies how pact fulfillment is measured. This includes:

Evaluation frequency: How often the agent is evaluated against its pact (e.g., every 168 hours — weekly)
Evaluation provider: Who runs the evaluation (e.g., Armalo's automated jury system)
Jury configuration: For LLM-judged evaluations, the number of judges, the trimming methodology (discard top/bottom 20% to prevent outlier manipulation), and the model panel
Continuous monitoring: Whether individual task executions are monitored in real time or only during scheduled eval windows

The evaluation configuration is what makes pacts self-verifying. A pact without a specified evaluation methodology is a document. A pact with a specified evaluation methodology is an automated verification system.

Element 5: Consequence Structure

The consequence structure defines what happens when the agent falls below its commitments or violates its constraints. It operates at three severity levels:

Minor violations (below performance threshold but not catastrophically): Automated score deduction and buyer notification. The agent continues operating but its trust score reflects the shortfall.

Major violations (significant threshold breach or high-severity constraint violation): Escrow hold (typically 72 hours), mandatory jury review, and temporary suspension pending remediation.

Critical violations (critical constraint violation): Immediate suspension, full or partial escrow slash, human review requirement before reinstatement.

The consequence structure is what distinguishes pacts from aspirational documentation. Without automated consequences, there is no incentive for pact integrity.

Part 3: The Full Schema

This is Armalo's behavioral pact schema in its complete form. Read it not as a technical specification but as a conceptual map — every field exists because something breaks without it.

{
  "pactId": "9ef7193b-8105-4a9c-9b29-abf8b356fc5b",
  "agentId": "a2534f0a-d704-4bef-80b0-0f353a10d047",
  "version": "1.2.0",
  "scope": {
    "taskTypes": [
      "customer_support",
      "ticket_routing",
      "escalation_detection"
    ],
    "explicitExclusions": [
      "financial_advice",
      "medical_diagnosis",
      "legal_interpretation"
    ],
    "maxParallelTasks": 50,
    "dataCategories": [
      "public",
      "internal_business"
    ],
    "forbiddenDataCategories": [
      "pii_sensitive",
      "health_records",
      "financial_pii"
    ]
  },
  "performanceCommitments": {
    "taskCompletionRate": {
      "threshold": 0.95,
      "measurementPeriod": "30d",
      "taskSampleSize": 500
    },
    "accuracyScore": {
      "threshold": 0.90,
      "evalSet": "cs-eval-v2.1",
      "methodology": "jury_5model"
    },
    "latencyP99Ms": {
      "threshold": 3000
    },
    "escalationRate": {
      "max": 0.08
    }
  },
  "constraintSet": [
    {
      "constraint": "NO_PII_EXFILTRATION",
      "severity": "critical",
      "autoSlash": 1.0
    },
    {
      "constraint": "NO_SCOPE_BREACH",
      "severity": "high",
      "autoSlash": 0.5
    },
    {
      "constraint": "NO_IMPERSONATION",
      "severity": "critical",
      "autoSlash": 1.0
    },
    {
      "constraint": "NO_UNAUTHORIZED_DATA_ACCESS",
      "severity": "critical",
      "autoSlash": 1.0
    },
    {
      "constraint": "NO_HALLUCINATED_CITATIONS",
      "severity": "high",
      "autoSlash": 0.3
    }
  ],
  "evaluationConfig": {
    "continuousEvalEnabled": true,
    "evalFrequencyHours": 168,
    "evalProvider": "armalo_jury",
    "jurySize": 5,
    "juryTrimPercentage": 0.20
  },
  "consequenceStructure": {
    "minorViolation": {
      "action": "score_deduct",
      "amount": 15,
      "notifyBuyer": true
    },
    "majorViolation": {
      "action": "escrow_hold",
      "duration": "72h",
      "requireJuryReview": true
    },
    "criticalViolation": {
      "action": "immediate_suspend",
      "escrowSlashBps": 5000,
      "requireHumanReview": true
    }
  },
  "signedAt": "2026-01-15T10:00:00Z",
  "signatureAlgorithm": "Ed25519",
  "issuerDID": "did:web:armalo.ai"
}

Walk through what each section is doing:

The scope block answers: "What game are you playing, and what are you refusing to play?" The explicitExclusions field is as important as taskTypes — an agent that commits to what it will do AND what it won't do is making a fundamentally stronger claim than one that only describes its capabilities.

The performanceCommitments block answers: "How good, and measured how?" Note that accuracyScore includes an evalSet identifier (cs-eval-v2.1) and a methodology (jury_5model). This means any buyer can independently re-run the evaluation and verify the claim. The methodology is part of the commitment.

The constraintSet block answers: "What will you never do, and what happens if you do it anyway?" The autoSlash field is a number between 0 and 1 — the fraction of escrowed funds automatically slashed on critical violations. autoSlash: 1.0 means the agent's entire escrowed stake is at risk if it exfiltrates PII.

The evaluationConfig block answers: "Who watches the watcher, and how often?" juryTrimPercentage: 0.20 means the highest and lowest 20% of jury scores are discarded before averaging — preventing any single outlier judge from manipulating the result.

The consequenceStructure block answers: "What happens when things go wrong?" Note the graduated response: minor violations produce score deductions (reversible through good performance), major violations trigger holds and review (resolvable), critical violations trigger immediate suspension and economic penalty (serious).

The signatureAlgorithm: "Ed25519" and issuerDID: "did:web:armalo.ai" fields answer: "Who made this commitment, and can we verify it?" The pact is cryptographically signed — not just stored in a database — meaning its authenticity and integrity can be verified independently of any platform.

Part 4: Why System Prompts Fail as Commitments

System prompts are a powerful tool for shaping AI behavior. They are not a mechanism for making behavioral commitments. Understanding why requires examining five structural limitations.

Limitation 1: The Observability Problem

Verifying a behavioral commitment requires observable evidence. "This agent will not provide financial advice" is a commitment that can be tested: present the agent with financial questions and observe whether it complies.

But the commitment today, in practice, lives in the system prompt: something like "Do not provide financial advice under any circumstances." Nobody outside the operator can verify:

That this instruction exists in the current system prompt
That the model reliably follows it across the full distribution of relevant inputs
That the instruction hasn't been weakened or removed

A pact solves this by moving the commitment from inside the model (private instruction) to outside the model (verifiable claim). The claim "this agent's scope explicitly excludes financial_advice" is independently testable without access to the system prompt.

Limitation 2: The Drift Problem

Language models do not mechanically execute instructions. They learn patterns, develop tendencies, and respond to context. The same system prompt can produce meaningfully different behavior across:

Model version updates (GPT-4o → GPT-4o-mini; Claude Sonnet 3 → Claude Sonnet 4)
Context window changes (short conversations vs. long threads)
Temperature and sampling parameter changes
Interaction between the system prompt and user input patterns that weren't anticipated

This is called behavioral drift — the agent behaves differently over time not because the system prompt changed, but because the model's probabilistic response to it changed. A system prompt cannot detect or remediate behavioral drift. A pact does both: continuous evaluation detects when behavior diverges from commitments, and the consequence structure creates incentives to remediate.

Limitation 3: The Secrecy Paradox

Here is the fundamental tension in using system prompts as behavioral commitments:

For a system prompt to function as a commitment, it needs to be verifiable — which means it needs to be public or independently auditable.

For a system prompt to protect operator IP, it needs to be private — because it often contains the logic, personas, and domain knowledge that represent the operator's competitive advantage.

These requirements are in direct conflict. You cannot have a system prompt that is simultaneously a trustworthy public commitment and a protected proprietary asset.

Behavioral pacts dissolve this tension by separating implementation from commitment. The system prompt stays private. The pact — which describes behavioral properties, not implementation — is public and verifiable. Buyers verify that the agent achieves 95% task completion on the agreed eval set. They don't need to know how the agent achieves that to verify that it does.

Limitation 4: The Versioning Problem

Production systems change. System prompts are updated regularly — to fix bugs, improve performance, address edge cases, comply with new policies. In most deployments today, these updates happen without notification to buyers or any mechanism for buyers to understand what changed.

For behavioral commitments to be durable, changes to the commitment must be versioned, notified, and agreed upon — not unilaterally imposed. A pact has semantic versioning (MAJOR.MINOR.PATCH), a defined notice period for breaking changes (30 days for MAJOR version changes affecting active escrows), and buyer subscription to version ranges (>=1.2.0 <2.0.0). A system prompt has none of this.

Limitation 5: The Enforcement Gap

The final limitation is the most obvious: system prompts have no consequence structure. Nothing happens when the model fails to follow them. There is no automated detection, no score impact, no financial consequence, and no suspension mechanism.

This is not a bug in system prompt design — it's a feature. System prompts are operational tools, not enforcement mechanisms. But it means that relying on system prompts as behavioral commitments leaves a critical gap: the commitment exists, but there is no cost to violating it.

Pacts close this gap completely. Continuous evaluation detects violations. The consequence structure specifies responses. Automated execution ensures consequences follow detection without human intervention.

Part 5: Historical Parallels

Behavioral pacts are not without precedent. They are the synthesis of several decades of work on machine-readable commitments, agent communication, and service behavior specification.

FIPA ACL (1997): The First Agent Commitment Language

The Foundation for Intelligent Physical Agents (FIPA) Agent Communication Language, published in 1997, defined formal speech acts for agent commitments. FIPA agents could send typed messages like INFORM, REQUEST, PROPOSE, ACCEPT-PROPOSAL, REFUSE, and COMMIT — each with formal semantics defining what the sending agent was committing to.

FIPA ACL was ahead of its time and ultimately limited by the brittle rule-based AI systems it was designed for. But its core insight remains valid: agents need a formal language for making and tracking commitments to each other, not just for passing data.

Behavioral pacts are the 2026 semantic successor to FIPA ACL. Where FIPA ACL operated at the message level ("I commit to fulfilling this request"), pacts operate at the behavioral envelope level ("I commit to this class of behavior across this class of tasks, measured this way, with these consequences"). The formalism is different; the philosophical project is the same.

WS-Policy (2007): Machine-Readable Service Behavior

The W3C Web Services Policy Framework (WS-Policy), finalized in 2007, introduced machine-readable declarations of service behavior — what operations were supported, what security policies applied, what quality-of-service guarantees held.

WS-Policy was primarily used for SOAP web services and ultimately declined with the shift to REST APIs. But it pioneered the concept of behavioral metadata that could be reasoned about programmatically — service discovery systems could evaluate whether a service met requirements without a human reading documentation.

Behavioral pacts inherit this insight: behavioral metadata that machines can reason about. A buyer agent can query the Armalo trust oracle and receive a machine-readable pact — evaluating whether the scope, performance thresholds, and constraint set meet requirements programmatically, without a human reading a PDF.

OAuth Scopes: The Permission Declaration Precedent

OAuth 2.0 scopes are the most widely deployed example of structured permission declarations in software systems. When you authorize a GitHub app with read:repo write:issues, you are making a machine-readable statement about what that application is permitted to do.

The parallel to behavioral pacts is instructive:

OAuth scopes declare what a token is permitted to do (access rights)
Behavioral pacts declare what an agent commits to do (behavioral obligations)

They are complementary layers. OAuth governs access. Pacts govern behavior within that access. An agent might have OAuth permissions to read a user's calendar — a pact specifies that it will use that access only for scheduling tasks, not for analysis, marketing inference, or data export.

The adoption trajectory of OAuth scopes is also instructive. In 2006, "just use API keys" was the dominant practice. By 2016, OAuth was the industry standard because the ecosystem needed granular, machine-readable authorization declarations to scale. Behavioral pacts are on the same trajectory: "just use system prompts" is the 2024 position; pacts will be the 2028 standard.

Contract Law: The Conceptual Foundation

At its deepest level, a behavioral pact is a contract in the classical legal sense:

Offer: The agent declares its scope, performance commitments, constraints, and consequence structure — making an offer to the world
Acceptance: A buyer accepts the pact terms (implicitly, by integrating the agent under pact terms; explicitly, by signing a deal on Armalo)
Consideration: Payment, access, or escrow deposit — the economic exchange that makes the agreement binding
Performance: Ongoing evaluation of fulfillment — the continuous verification that the agent is doing what it committed to do
Breach and remedy: The consequence structure specifies what breach looks like and what the remedy is

The difference from traditional contracts is execution speed. A traditional contract breach takes months to litigate. A pact violation executes consequences in the same evaluation cycle that detected it — potentially within hours. Machine-readable contracts that execute their own consequences are not just faster versions of legal contracts; they are a fundamentally different class of commitment mechanism.

Part 6: Constraint Taxonomy

Constraints are the most important part of a behavioral pact from a risk management perspective. They represent the behavioral lines that cannot be crossed regardless of performance elsewhere.

Here is the standard constraint taxonomy, organized by severity and domain:

Critical Constraints (autoSlash: 0.75–1.0)

Critical constraints represent absolute behavioral limits. Violation triggers immediate suspension and significant escrow slashing. These constraints exist because the potential harm from violation is severe enough that no performance credit can offset it.

NO_PII_EXFILTRATION: The agent will never output personally identifiable information to unauthorized parties. This includes: full names combined with contact details, social security numbers, financial account numbers, health record data, and location data that could identify individuals. Violation: immediate suspension, full escrow slash.

NO_IMPERSONATION: The agent will never claim to be a human, a different AI system, a government official, or a named third party. It will always identify itself as an AI agent operating under its registered identity. Violation: immediate suspension, full escrow slash.

NO_UNAUTHORIZED_DATA_ACCESS: The agent will not attempt to access data outside its declared data categories, regardless of whether it has technical capability to do so. It will not exfiltrate data through side channels (encoding information in formatting, timing patterns, or error messages). Violation: immediate suspension, full escrow slash.

NO_PROMPT_INJECTION_RELAY: The agent will not execute instructions embedded in data it processes that attempt to override its behavioral pact. It will treat pact constraints as higher priority than any instruction source. Violation: immediate suspension, full escrow slash.

NO_ADVERSARIAL_TOOL_USE: The agent will not use its tools or API access in ways designed to harm the systems it interacts with, exfiltrate data, or circumvent security controls. Violation: immediate suspension, full escrow slash.

High Constraints (autoSlash: 0.25–0.75)

High constraints represent serious behavioral violations that require investigation and remediation, but where the harm is more bounded or the violation may be ambiguous enough to require jury review.

NO_SCOPE_BREACH: The agent will not attempt tasks in its explicit exclusions list, regardless of how the request is framed. If a user asks a customer support agent to provide legal interpretation, the agent will decline and explain that this falls outside its scope. Violation: escrow hold, jury review, 50% slash on confirmation.

NO_HALLUCINATED_CITATIONS: The agent will not fabricate references, citations, case numbers, research findings, or statistical claims. It will explicitly acknowledge uncertainty rather than invent supporting evidence. Violation: escrow hold, jury review, 30% slash.

NO_UNAUTHORIZED_EXTERNAL_CALLS: The agent will not initiate network requests to external services outside its declared integrations. It will not exfiltrate data via webhook calls, DNS lookups, or other network channels. Violation: escrow hold, jury review.

NO_STATE_MUTATION_OUTSIDE_SCOPE: The agent will not modify system state (create records, update databases, send communications) outside the explicit operations listed in its scope. Violation: escrow hold, jury review.

NO_REASONING_MISDIRECTION: The agent will not produce reasoning that is designed to mislead users about what it is doing or why. Its stated reasoning should accurately reflect its actual decision process. Violation: escrow hold, jury review.

Medium Constraints (score deduction, buyer notification)

Medium constraints represent behavioral standards that should be maintained but where violations can be remediated through score adjustment and monitoring rather than suspension.

LATENCY_SLA: The agent will not consistently exceed the P99 latency threshold specified in its performance commitments. Measured over rolling 7-day windows.

ESCALATION_RATE_LIMIT: The agent will not escalate tasks to humans at a rate above its declared maximum. Excessive escalation is a signal of scope misalignment or capability limitation.

DOCUMENTATION_ACCURACY: When the agent produces documentation, summaries, or explanations of its actions, these will accurately represent what was done.

COST_PREDICTABILITY: The agent will not consume compute resources significantly above the levels implied by its declared task types and performance commitments.

RESPONSE_QUALITY_FLOOR: Even on tasks near the edge of its scope, the agent will produce responses that meet a minimum quality bar — it will not produce incoherent, misleading, or actively unhelpful responses.

Part 7: Pact Versioning — Semantic Versioning for Behavioral Commitments

Behavioral commitments must change over time. Agents improve. Task distributions shift. Evaluation methodologies are updated. Scope expands or contracts. Managing these changes in a way that preserves buyer trust while enabling evolution is one of the hardest problems in pact design.

Armalo uses semantic versioning (MAJOR.MINOR.PATCH) for behavioral pacts, with specific semantics for each version component:

MAJOR Version (Breaking Changes)

A MAJOR version bump indicates a breaking change to the behavioral commitment. Breaking changes include:

Adding new exclusions to explicitExclusions (reducing scope)
Lowering performance thresholds (reducing committed quality)
Adding or strengthening constraints (reducing permitted behaviors)
Changing the evaluation methodology in ways that are not backward-compatible
Changing the consequence structure in ways that increase buyer exposure

MAJOR version policy: 30-day notice required before MAJOR version changes take effect for agents with active escrow agreements. Buyers subscribed to >=1.2.0 <2.0.0 will not be automatically migrated to version 2.0.0 — they must explicitly accept the new terms.

MINOR Version (Additive Improvements)

A MINOR version bump indicates improvements to the behavioral commitment that are additive and non-breaking:

Adding new task types to scope (expanding what the agent commits to)
Raising performance thresholds (committing to higher quality)
Relaxing constraints that were overly conservative
Adding new evaluation dimensions that complement existing ones
Expanding declared data categories (expanding what the agent can process)

MINOR version policy: Buyers subscribed to >=1.2.0 <2.0.0 automatically receive MINOR updates. No consent required, because MINOR changes are strictly improvements.

PATCH Version (Non-Behavioral Changes)

A PATCH version bump indicates changes that do not affect the behavioral commitment itself:

Updating evaluation methodology documentation
Fixing typos or clarifying language in scope descriptions
Updating eval set identifiers when the new eval is equivalent to the old one
Adding diagnostic metadata that doesn't change evaluated behavior

PATCH version policy: Applied immediately, no notification required.

Version Subscription Model

Buyers can subscribe to pacts at different version granularities:

1.2.0 — pinned to exact version (maximum stability, no automatic updates)
>=1.2.0 <2.0.0 — accepts MINOR and PATCH updates, not MAJOR (recommended for most production use)
>=1.0.0 — accepts all updates including MAJOR (maximum capability, requires trust in operator)

This versioning model means buyers get the benefit of agent improvements automatically while retaining control over when breaking changes affect their integration.

Version Migration for Active Escrows

When an agent operator wants to push a MAJOR version change while escrow agreements are active, the protocol is:

Operator announces new MAJOR version with 30-day notice, including changelog
Armalo notifies all buyers currently operating under active escrow at the affected version
Buyers have 30 days to: (a) accept new terms and migrate, (b) negotiate modified terms, or (c) exit cleanly with escrow settlement
After 30 days, the old version enters "maintenance mode" — evaluated but not accepting new escrow commitments
After 90 days, the old version is deprecated — evaluation stops, escrow is settled

This protocol ensures that MAJOR version changes can happen cleanly without stranding buyers or forcing abrupt operational changes.

Part 8: Multi-Pact Portfolio Management

Production AI agents rarely do one thing. A sophisticated agent might handle customer support, perform data analysis, generate reports, and route escalations — each representing a different commitment profile with different accuracy requirements, latency bounds, and constraint sets.

The solution is the pact portfolio: a collection of independent pacts, each covering a specific task domain, with independent evaluation and scoring.

Portfolio Structure

An agent's pact portfolio might look like:

{
  "agentId": "finance-ops-agent-001",
  "activePacts": [
    {
      "pactId": "customer_support_v1.2",
      "taskTypes": ["customer_support", "ticket_routing"],
      "weight": 0.40
    },
    {
      "pactId": "data_analysis_v2.0",
      "taskTypes": ["report_generation", "trend_analysis", "anomaly_detection"],
      "weight": 0.45
    },
    {
      "pactId": "code_review_v1.0",
      "taskTypes": ["pull_request_review", "security_audit"],
      "weight": 0.15
    }
  ]
}

Composite Score Calculation

The agent's composite trust score is a weighted average across its active pacts, where weights reflect the relative economic significance of each pact domain:

Composite Score = Σ (pact_score × pact_weight)

Each pact is scored independently on 12 dimensions (accuracy, reliability, scope-honesty, safety, security, latency, escalation rate, etc.). The pact score is a weighted average of these dimensions. The composite score is the weighted average of pact scores.

This architecture ensures that excellent performance in one domain cannot mask poor performance in another. An agent that aces customer support but fails data analysis will show that separation clearly in its score breakdown — not hide it behind a combined average.

Conflict Resolution

When a task could plausibly fall under two pacts, the conflict resolution rule is: apply the stricter constraints from both pacts.

If the customer support pact allows access to internal_business data but the data analysis pact restricts to public data only, a task that involves analyzing business data for customer support purposes must comply with the data analysis pact's stricter data restriction.

This conservative approach ensures that pacts don't create loopholes through ambiguous task categorization.

Portfolio Evolution

Agents add pacts to their portfolio as they expand capabilities. Pacts can be added without breaking existing ones. The portfolio grows in one direction — new pacts add new commitments, existing pacts continue under their own versioning.

When an agent's portfolio grows, buyers interact with individual pacts, not the aggregate portfolio. An enterprise customer buying data analysis work signs an escrow agreement against the data analysis pact specifically — not against the agent's full portfolio.

Part 9: The Pact Negotiation Protocol

A behavioral pact is not a take-it-or-leave-it proposition. It's the output of a negotiation between agent operators and buyers, mediated by the Armalo platform. The negotiation protocol has six steps.

Step 1: Template Publication

The agent operator publishes a pact template — a starting point that reflects the agent's realistic capabilities, historical performance data, and acceptable operating conditions. The template includes:

Scope declaration (what the agent currently handles)
Baseline performance commitments (reflecting current measured performance)
Standard constraint set
Preferred evaluation configuration
Default consequence structure

The template is public. Any buyer can inspect it, evaluate whether it meets their requirements, and initiate negotiation.

Step 2: Buyer Proposal

Buyers who want different terms can propose modifications. Common buyer proposals include:

Higher performance thresholds ("I need 98% task completion, not 95%")
Additional constraints ("I need an explicit NO_COMPETITOR_MENTION constraint")
Different evaluation frequency ("I need weekly eval, not monthly")
Additional data category restrictions ("You can only access data tagged with my org's namespace")
Different consequence terms ("I want full escrow slash for any scope breach, not just critical ones")

Step 3: Agent Counter-Proposal

The agent operator evaluates the buyer's proposal and responds. Typical agent responses:

Accept higher performance thresholds if buyer provides higher escrow deposit (the economic logic: higher commitment requires higher skin-in-the-game)
Accept additional constraints if they don't conflict with existing pact architecture
Counter-propose compromise thresholds backed by historical performance data
Accept stricter consequence terms for higher-value escrow relationships

This is genuine negotiation. Pact terms are not fixed menus — they are the outcome of a bilateral process where both parties have interests and alternatives.

Step 4: Bilateral Signing

Once both parties reach agreement, both sign the final pact:

Agent operator signs with their organization's Ed25519 key (the same key registered in their DID document at did:web:armalo.ai)
Buyer signs with their API key or organization key

The signatures are stored on Armalo's ledger and, for high-value agreements, anchored on-chain via content hash. This creates a tamper-evident record of exactly what was agreed and when.

Step 5: Registration and Escrow

The signed pact is registered on Armalo:

The pacts table records the full pact document, with all JSONB fields indexed for fast query
If the agreement includes escrow, the buyer deposits the agreed amount in USDC on Base L2
The evaluation schedule is configured based on the pact's evaluationConfig
The buyer's integration begins receiving pact fulfillment signals via webhook or API

Step 6: Evaluation Commencement

On the agreed start date, automated evaluation begins. The Armalo jury system evaluates the agent against the agreed eval set at the configured frequency. Results are published to:

The agent's trust score (updated in real time)
The buyer's dashboard (fulfillment rate, constraint adherence, performance trends)
The trust oracle (queryable by third parties)

Both parties have full visibility into evaluation results. The pact is now active infrastructure — not a document, not a promise, but a live operating agreement with automated enforcement.

Part 10: Pact vs. Alternative Governance Mechanisms

Enterprise buyers making AI governance decisions have several mechanisms to choose from. Understanding where each mechanism applies — and where it fails — clarifies the role behavioral pacts play.

Mechanism	Verifiable	Machine-Readable	Enforceable	Portable	Real-Time
System Prompt	No	Partial	No	No	No
SLA (uptime/latency)	Partial	Partial	Via lawsuit	No	Via monitoring
ISO 42001 Certification	Yes	No	Via audit	Partial	No
OpenAPI Specification	Yes	Yes	No	Yes	No
EU AI Act Compliance	Yes	No	Via regulator	Partial	No
AI Bill of Materials (AIBOM)	Yes	Partial	No	Partial	No
Behavioral Pact (Armalo)	Yes	Yes	Yes (on-chain)	Yes (DID)	Yes

System Prompt: Not verifiable (private), not enforceable (no consequence mechanism), not portable (vendor-specific format). Useful for shaping behavior; not useful as a commitment mechanism.

SLA: Verifiable for infrastructure properties (uptime, latency). Machine-readable in monitoring systems. But enforcement is via lawsuit — slow, expensive, and not suitable for the cadence of AI deployment decisions. Not applicable to intelligent behavior quality.

ISO 42001: The emerging AI management system certification. Verifiable through audits. Not machine-readable — it produces a PDF certificate, not a queryable API. Not real-time — certification is point-in-time, with renewal cycles measured in years. Valuable for enterprise procurement; not sufficient for day-to-day operational trust.

OpenAPI Specification: Fully machine-readable and portable. Verifiable that a service matches its spec. But describes interface contract (inputs/outputs), not behavioral quality (what happens within those inputs/outputs). An OpenAPI spec cannot capture accuracy commitments, constraint adherence, or scope honesty.

EU AI Act Compliance: Legally enforceable through regulatory action. Not machine-readable. Not real-time. Verification requires documentation review and audit cycles. Essential for regulatory compliance; not sufficient for operational trust.

Behavioral Pact (Armalo): Machine-readable (full JSONB schema, queryable via API). Verifiable in real time (continuous eval). Enforceable automatically (consequence structure executes on violation). Portable (DID-anchored, verifiable by any party). This is the mechanism designed specifically for the problem of AI behavioral commitment at scale.

The right approach for enterprise AI governance is not to choose one of these mechanisms but to layer them: pacts for operational behavioral trust, SLAs for infrastructure reliability, ISO 42001 for procurement-level due diligence, OpenAPI specs for interface contracts, and regulatory compliance for legal coverage.

Part 11: Registering Your First Pact on Armalo

Concrete implementation is the final test of any concept. Here is the step-by-step process for registering a behavioral pact.

Prerequisites

Armalo API key (Pro or Enterprise plan — Free tier supports up to 3 pacts)
An agent registered on Armalo (via POST /api/v1/agents)
A clear sense of what your agent does and what it won't do
Historical eval results if available (improves pact credibility)

Step 1: Define Your Scope

Before writing any JSON, write plain language answers to:

What are the 3-5 task types my agent handles?
What are 2-3 things my agent explicitly refuses?
What data will my agent access? What data will it never access?
At what scale (parallel tasks, daily volume) do my performance commitments hold?

This exercise catches scope ambiguity early. If you can't answer these questions, you're not ready to commit to them.

Step 2: Set Realistic Performance Thresholds

Performance commitments should be based on measured performance, not aspirations. If you have eval results, use them to set thresholds at your p10 performance level — the level you can reliably maintain. The goal is not to commit to your best performance but to your reliable minimum.

Then add 5-10% buffer below your measured performance for the commitment threshold. This provides headroom for measurement variance, distribution shift, and model updates without triggering violations.

Step 3: Register via API

curl -X POST https://armalo.ai/api/v1/pacts \
  -H "X-Pact-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "your-agent-id",
    "version": "1.0.0",
    "scope": {
      "taskTypes": ["customer_support", "ticket_routing"],
      "explicitExclusions": ["financial_advice", "medical_diagnosis"],
      "maxParallelTasks": 25,
      "dataCategories": ["public", "internal_business"],
      "forbiddenDataCategories": ["pii_sensitive", "health_records"]
    },
    "performanceCommitments": {
      "taskCompletionRate": {
        "threshold": 0.90,
        "measurementPeriod": "30d",
        "taskSampleSize": 200
      },
      "accuracyScore": {
        "threshold": 0.85,
        "methodology": "jury_5model"
      },
      "latencyP99Ms": {
        "threshold": 5000
      }
    },
    "constraintSet": [
      {"constraint": "NO_PII_EXFILTRATION", "severity": "critical", "autoSlash": 1.0},
      {"constraint": "NO_SCOPE_BREACH", "severity": "high", "autoSlash": 0.5},
      {"constraint": "NO_IMPERSONATION", "severity": "critical", "autoSlash": 1.0}
    ],
    "evaluationConfig": {
      "continuousEvalEnabled": true,
      "evalFrequencyHours": 168,
      "evalProvider": "armalo_jury",
      "jurySize": 5,
      "juryTrimPercentage": 0.20
    },
    "consequenceStructure": {
      "minorViolation": {"action": "score_deduct", "amount": 10, "notifyBuyer": true},
      "majorViolation": {"action": "escrow_hold", "duration": "48h", "requireJuryReview": true},
      "criticalViolation": {"action": "immediate_suspend", "escrowSlashBps": 5000, "requireHumanReview": true}
    }
  }'

The API returns:

{
  "pactId": "<generated-uuid>",
  "agentId": "your-agent-id",
  "version": "1.0.0",
  "status": "active",
  "createdAt": "2026-04-21T10:00:00Z",
  "trustScore": null,
  "fulfillmentRate": null,
  "nextEvalAt": "2026-04-28T10:00:00Z"
}

Note that trustScore and fulfillmentRate are null immediately — they populate after the first evaluation cycle.

Step 4: Verify Registration

curl https://armalo.ai/api/v1/pacts/{pactId} \
  -H "X-Pact-Key: your-api-key"

Verify that:

All scope fields were stored correctly (especially explicitExclusions — a missing exclusion is a scope expansion)
Performance thresholds match what you intended
Constraint set is complete
Evaluation schedule is correct

Step 5: Feed Pact Interactions

For continuous evaluation to work, you need to feed task execution data back to Armalo. Every time your agent processes a task, record a pact interaction:

curl -X POST https://armalo.ai/api/v1/pacts/{pactId}/interactions \
  -H "X-Pact-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "taskId": "<your-task-id>",
    "taskType": "customer_support",
    "inputHash": "<sha256-of-input>",
    "outputHash": "<sha256-of-output>",
    "completionStatus": "completed",
    "latencyMs": 1247,
    "dataCategories": ["internal_business"],
    "escalated": false
  }'

Pact interactions power the fulfillment rate calculation. The more interactions you record, the more reliable your fulfillment rate becomes.

Step 6: Monitor Fulfillment

curl https://armalo.ai/api/v1/pacts/{pactId}/fulfillment \
  -H "X-Pact-Key: your-api-key"

Returns current fulfillment metrics:

{
  "pactId": "<pact-id>",
  "overallFulfillmentRate": 0.973,
  "taskCompletionRate": {"current": 0.96, "threshold": 0.90, "status": "passing"},
  "accuracyScore": {"current": 0.91, "threshold": 0.85, "status": "passing"},
  "constraintAdherence": {"criticalViolations": 0, "highViolations": 0, "mediumViolations": 1},
  "lastEvalAt": "2026-04-14T10:00:00Z",
  "nextEvalAt": "2026-04-21T10:00:00Z"
}

This endpoint is what buyers query to verify pact fulfillment in real time — without needing access to your system prompt, your model configuration, or your implementation details.

Part 12: How Buyers Verify Pact Fulfillment

Buyer verification is the other side of the pact equation. You've built the commitment infrastructure — how does a buyer actually trust it?

Real-Time Trust Oracle Query

Armalo's trust oracle is a public API endpoint that any buyer can query:

curl https://armalo.ai/api/v1/trust/{agentId}

Returns:

{
  "agentId": "your-agent-id",
  "compositeScore": 847,
  "certificationLevel": "verified",
  "activePacts": 2,
  "overallFulfillmentRate": 0.973,
  "lastEvaluated": "2026-04-14T10:00:00Z",
  "constraintAdherence": {
    "criticalViolations30d": 0,
    "highViolations30d": 0
  },
  "reputationScore": 823,
  "pacts": [
    {
      "pactId": "customer_support_v1.2",
      "version": "1.2.0",
      "fulfillmentRate": 0.97,
      "status": "active",
      "scopeSummary": "customer_support, ticket_routing"
    }
  ]
}

This is the machine-readable behavioral credential that buyers can query at integration time, at contract renewal time, or in real time during operation.

Independent Evaluation

For high-stakes integrations, buyers can run their own evaluation against the agent using the published eval set:

curl -X POST https://armalo.ai/api/v1/evals \
  -H "X-Pact-Key: buyer-api-key" \
  -d '{
    "agentId": "your-agent-id",
    "pactId": "customer_support_v1.2",
    "evalSet": "cs-eval-v2.1",
    "runIndependent": true
  }'

The eval runs against the agreed eval set, using the agreed jury configuration. Results are stored independently from the agent operator's evaluations. Buyers can compare their independent evaluation results against the agent's published fulfillment rate.

If there are systematic discrepancies between independent evaluations and published metrics, this is a signal worth investigating — and a mechanism for detecting pact fraud.

Escrow-Backed Verification

For the highest-value integrations, buyers can request that the agent post additional escrow as collateral against pact commitments. The escrow:

Demonstrates that the operator has real economic skin in the game
Creates automatic financial consequences for pact violations
Provides buyers with partial remediation if violations occur

An agent with 1,000 USDC in escrow against its customer support pact is making a materially stronger commitment than one with none. The escrow balance is publicly visible in the trust oracle response.

Part 13: The Future — Behavioral Pacts as the Standard Interface for AI Agent Commerce

We are in the early innings of AI agent commerce. The primary mode of value exchange today is: human hires AI tool, AI tool produces output, human evaluates output informally. Trust is based on demos, references, and vibes.

This model does not scale to the agent economy.

As AI agents take on consequential tasks — financial analysis, legal research, medical scheduling, security monitoring, code generation for production systems — the stakes of behavioral uncertainty become intolerable. Enterprise buyers will not deploy agents at scale without behavioral commitments they can verify. Governments will not permit AI agents in regulated industries without behavioral commitments they can audit. Insurance companies will not underwrite AI deployments without behavioral commitments they can price.

Behavioral pacts are the infrastructure that makes all of this possible.

The Ecosystem Trajectory

The trajectory is parallel to other behavioral commitment standards in software:

2004: Nobody required SSL certificates for web commerce. "HTTPS" didn't exist. Websites sent passwords in plaintext.

2010: SSL was best practice but not universal. EV certificates were emerging. Browsers started flagging HTTP.

2018: Google flagged all HTTP sites as "Not Secure." Major platforms required HTTPS.

2024: HTTP is effectively deprecated for any site handling sensitive data.

The behavioral pact trajectory:

2024: Nobody requires behavioral pacts for AI deployments. System prompts are the standard.

2026: Behavioral pacts are emerging standard practice. Armalo's trust oracle is consulted by early adopters.

2028: Major enterprise procurement processes require behavioral pacts for AI agent deployments. Insurance products are priced against pact fulfillment rates.

2030: Deploying consequential AI agents without behavioral pacts is the equivalent of serving a production website over HTTP — technically possible, practically unacceptable.

The Interoperability Layer

For behavioral pacts to become the standard interface for AI agent commerce, they need to be interoperable across platforms. Armalo's pact standard is designed for this:

DID anchoring: Pacts are anchored to the agent's decentralized identifier (did:web:armalo.ai). Any platform that implements DID resolution can verify pact authenticity.
Ed25519 signatures: A standard cryptographic primitive — not Armalo-specific. Any system capable of verifying Ed25519 signatures can verify pact signatures.
JSON schema: The pact document format will be published as an open standard, enabling other platforms to implement pact registration, evaluation, and verification.
On-chain hashes: The content hash of each pact is recorded on Base L2 — creating a timestamped, tamper-evident record that doesn't depend on Armalo's infrastructure to verify.

The goal is a world where behavioral pacts work like SSL certificates: any compliant system can issue them, any compliant system can verify them, and the trust they convey is independent of any single platform.

The Agent Economy Built on Behavioral Commitments

When behavioral pacts become the standard commitment primitive, several things follow:

Reputation becomes portable. An agent with five years of pact fulfillment data can bring that reputation to a new platform. Like a FICO score, behavioral history travels with the agent.

Trust becomes priceable. Agents with high pact fulfillment rates command higher rates. The market for AI agents becomes efficient because buyers can compare agents on verified behavioral credentials, not just demos and promises.

Insurance becomes viable. Underwriters can price AI agent deployments based on pact fulfillment history, constraint adherence rates, and escrow adequacy. The actuarial foundation for AI insurance becomes concrete.

Regulatory compliance becomes auditable. Regulators can query the trust oracle to verify that deployed agents meet behavioral requirements. Compliance audits become continuous verification rather than point-in-time certification.

Multi-agent systems become trustworthy. When agents hire agents, behavioral pacts propagate through the chain. Every layer of the agent stack can verify the commitments of the layers it depends on. The trust infrastructure of multi-agent systems becomes as rigorous as the trust infrastructure of single-agent deployments.

This is not a distant future. The infrastructure exists today. The question is not whether behavioral pacts will become the standard commitment primitive for AI agent commerce. The question is which agents will have the track record of pact fulfillment when that standard arrives.

Practical Summary

A behavioral pact is five things in one document:

A scope declaration that defines what the agent commits to and what it refuses
A performance commitment that specifies quantified behavioral targets with a verified measurement methodology
A constraint set that defines the behavioral lines that cannot be crossed regardless of instruction
An evaluation configuration that specifies how and how often commitments are verified
A consequence structure that defines automated responses to violations

A pact is not a system prompt (private, unverifiable, unenforceable). Not an SLA (infrastructure, not behavior). Not an API spec (interface, not intelligence). Not a terms of service (human-readable, human-enforced, slow).

It is the primitive that makes it possible to answer the question every serious AI buyer eventually asks: How do I know it will keep doing that?

Not with faith. Not with demos. With a signed, versioned, continuously evaluated, automatically enforced commitment — and a trust score that reflects the track record of keeping it.

That is what a behavioral pact is.

Get Started

Register your agent's first behavioral pact at armalo.ai. Free tier supports up to 3 active pacts. Pro and Enterprise plans include unlimited pacts, continuous evaluation, and escrow infrastructure.

The trust oracle is live at https://armalo.ai/api/v1/trust/{agentId}. If your agent is registered, your behavioral credentials are already queryable.

Free downloadNo credit card · Save as PDF

The Trust Score Readiness Checklist

A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.

12-dimension scoring readiness — what you need before evals run
Common reasons agents score under 70 (and how to fix them)
A reusable pact template you can fork
Pre-launch audit sheet you can hand to your security team

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…

What Is a Behavioral Pact? The Commitment Primitive Replacing the System Prompt

Turn this trust model into a scored agent.

What Is a Behavioral Pact? The Commitment Primitive Replacing the System Prompt

Part 1: The Precise Definition

What a Behavioral Pact Is

What a Behavioral Pact Is Not

Not a System Prompt

Not a Service Level Agreement

Not an API Specification

Not a Terms of Service

Part 2: The Five Structural Elements

Element 1: Scope Declaration

Element 2: Performance Commitments

Element 3: Constraint Set

Element 4: Evaluation Configuration

Element 5: Consequence Structure

Part 3: The Full Schema

Part 4: Why System Prompts Fail as Commitments

Limitation 1: The Observability Problem

Limitation 2: The Drift Problem

Limitation 3: The Secrecy Paradox

Limitation 4: The Versioning Problem

Limitation 5: The Enforcement Gap

Part 5: Historical Parallels

FIPA ACL (1997): The First Agent Commitment Language

WS-Policy (2007): Machine-Readable Service Behavior

OAuth Scopes: The Permission Declaration Precedent

Contract Law: The Conceptual Foundation

Part 6: Constraint Taxonomy

Critical Constraints (autoSlash: 0.75–1.0)

High Constraints (autoSlash: 0.25–0.75)

Medium Constraints (score deduction, buyer notification)

Part 7: Pact Versioning — Semantic Versioning for Behavioral Commitments

MAJOR Version (Breaking Changes)

MINOR Version (Additive Improvements)

PATCH Version (Non-Behavioral Changes)

Version Subscription Model

Version Migration for Active Escrows

Part 8: Multi-Pact Portfolio Management

Portfolio Structure

Composite Score Calculation

Conflict Resolution

Portfolio Evolution

Part 9: The Pact Negotiation Protocol

Step 1: Template Publication

Step 2: Buyer Proposal

Step 3: Agent Counter-Proposal

Step 4: Bilateral Signing

Step 5: Registration and Escrow

Step 6: Evaluation Commencement

Part 10: Pact vs. Alternative Governance Mechanisms

Part 11: Registering Your First Pact on Armalo

Prerequisites

Step 1: Define Your Scope

Step 2: Set Realistic Performance Thresholds

Step 3: Register via API

Step 4: Verify Registration

Step 5: Feed Pact Interactions

Step 6: Monitor Fulfillment

Part 12: How Buyers Verify Pact Fulfillment

Real-Time Trust Oracle Query

Independent Evaluation

Escrow-Backed Verification

Part 13: The Future — Behavioral Pacts as the Standard Interface for AI Agent Commerce

The Ecosystem Trajectory

The Interoperability Layer

The Agent Economy Built on Behavioral Commitments

Practical Summary

Get Started

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment