Insights

OperatorCommitments & pacts

The Anatomy Of A Pact: Subject, Predicate, Evidence, Penalty, Renewal

2026-05-3122 minarmalo Team

Five fields are the minimum any enforceable behavioral pact has to carry. Strip one and the pact stops binding. This is the field-by-field engineering essay on what each one has to say and why.

Continue the reading path

Topic hub

Behavioral Contracts

This page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.

Strategic Guide

AI Agent Trust

Curated Collection

Builder Guides

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Stripe Compare plans

TL;DR

Every enforceable behavioral pact has five fields. Subject names who is committing to whom. Predicate states the behavior in a form precise enough to be evaluated. Evidence specifies what telemetry will count as proof. Penalty defines what is forfeited on violation. Renewal sets the expiry and extension rules. Strip any one of the five and the pact stops binding — it becomes a vague intent, an unmeasurable promise, an unaccountable claim, a perpetual obligation, or some other failure mode that puts the agent and its counterparty back in pre-pact chaos. This essay walks each field in depth, with field-level failure modes and a Pact Skeleton you can clone.

Why five fields and not four or six

Four fields are not enough because a pact missing any of the five fails structurally. Subject without predicate is just a relationship, not a commitment. Predicate without evidence is a claim with no way to verify it. Evidence without penalty is a measurement with no consequence. Penalty without renewal is a perpetual liability that nobody will sign. Renewal without subject is a stand-alone schedule that does not bind anybody. Each of the five is load-bearing in a way the others cannot replace.

Six fields are too many because every additional required field is a tax on adoption. Pact authors who are forced to fill in fields whose role they do not understand will fill them in poorly, treat them as ceremony, and resent the system. The five-field minimum is the smallest spec that satisfies the structural requirements, and additions beyond it should be optional fields tailored to specific pact classes (deal-level pacts often add settlement terms; runtime pacts often add latency budgets; safety pacts often add escalation paths) rather than universal requirements.

The analogy worth holding in mind is the structure of a mortgage. A mortgage requires a borrower (subject), a repayment promise (predicate), payment records and inspections (evidence), foreclosure rights (penalty), and a maturity date (renewal). All five are present in every mortgage in every jurisdiction. There are many optional fields — escrow accounts, insurance riders, prepayment terms — but the five-field minimum is what makes the document a mortgage rather than an IOU. Pacts have the same structural requirement and for the same reason: the document has to bind both parties under enforcement, and binding requires all five.

The rest of this essay walks each field. Each section covers what the field has to express, the failure modes when the field is sloppy, and the worked example for a customer-support agent so the abstract terms land in concrete form.

Subject: who is committing to whom

The Subject field names the parties to the pact. There are always at least two: the agent (or the operator on the agent's behalf) and the counterparty. Sometimes there are more — a marketplace as guarantor, a regulator as observer, an insurer as backstop — but two is the minimum.

Naming the agent sounds easy and is not. The agent's identity has to be persistent, cryptographically verifiable, and stable across model upgrades. "Our customer service bot" is not a Subject; that string does not survive the team renaming the bot, swapping the underlying model, or migrating to a different runtime. A Subject is a decentralized identifier or equivalent — a string that resolves to a public key, a current model fingerprint, a runtime configuration hash, and a history of prior versions. When the agent is upgraded, the identifier persists; when the operator changes hands, the identifier survives the transfer. This persistence is what lets pacts accumulate over time into something that resembles an identity.

Naming the counterparty has the same requirement. "Our enterprise customers" is not a counterparty; it is a market segment. A counterparty is a specific identity — a buyer, a marketplace, a platform integration, a settlement layer — with its own stable identifier. The pact binds two specific parties; if you cannot name the second one, you do not have a pact, you have a marketing claim.

The failure modes when Subject is sloppy are recognizable. Pacts get drafted with vague counterparty references like "all paying users" — and then when a specific user has a complaint, there is no way to determine whether their interaction was inside the pact's scope. Pacts get attached to model versions instead of agent identities — and then when the model is upgraded, the pact is silently invalidated and nobody notices for weeks. Pacts get signed by individual operators instead of the agent's persistent identity — and then when the operator leaves the company, the pact's enforceability is unclear.

A serviceable Subject section answers four questions clearly: who is the agent (decentralized identifier), what version of the agent is currently in force (model fingerprint, runtime hash), who is the counterparty (their own decentralized identifier), and is there a third-party witness or backstop (marketplace, insurer, regulator)? All four answers are signed alongside the pact and update along with it.

Worked example for a customer-support agent: the Subject names the agent as did:armalo:agent:cs-bot-acme-2026, currently running model fingerprint claude-sonnet-2026-04-15 on runtime configuration hash 7a3f...c2b1, committed to counterparty did:armalo:org:acme-corp, with the Armalo marketplace as guarantor. That string is unambiguous; it survives upgrades by versioning the model fingerprint and runtime hash; it lets future readers ask precise questions about which version of the agent was in force when.

Predicate: the behavior promise, in a form a machine can read

The Predicate field is the behavior the agent is committing to. This is the field where most teams write prose and produce something unenforceable. The Predicate has to be a precise statement of behavior that a machine — either runtime guardrails or a post-hoc jury — can evaluate against a recorded interaction.

The vocabulary of a useful Predicate is closer to specification than to mission statement. "The agent will be helpful" is a mission statement; it is not a Predicate. "The agent will respond to every incoming message within 90 seconds during business hours, defined as 9am to 9pm in the counterparty's stated timezone" is a Predicate. The first cannot be evaluated; the second can be evaluated by a five-line check against the message log.

Good Predicates share four properties. They are observable — the behavior they describe produces signals that exist in the agent's logs, the runtime telemetry, or the counterparty's records. They are bounded — they specify a time window, a transaction scale, a message count, a topic scope, or some other constraint that limits what the predicate covers. They are conditional in a structured way — "if A, then B" is fine, but the conditions have to be machine-checkable, not narratively implied. And they are negation-friendly — "the agent will not" predicates are as useful as "the agent will" predicates, and often more useful, because the failure mode of an unconstrained agent is doing things it should not.

The failure modes when Predicates are sloppy are the most painful in the entire pact lifecycle. A Predicate that says "the agent will be polite" cannot be enforced because politeness is unspecifiable; the post-hoc jury cannot grade it consistently and the runtime guardrail cannot check for it. A Predicate that says "the agent will refuse harmful requests" cannot be enforced because "harmful" is not defined and the agent's refusals will be either too many or too few depending on the model's mood that day. A Predicate that says "the agent will protect customer data" sounds substantive but does not specify what protection means in operational terms; it is documentation, not enforcement.

A serviceable Predicate is a list — three to ten of them in most pacts — of structured behavior commitments. Each one is precise enough to be evaluated. Each one names the dimension it touches (this is a reliability commitment, this is a safety commitment, this is a scope-honesty commitment) so the post-hoc jury knows which weighting to apply when the verdict feeds the composite score.

Worked example for the customer-support agent: "The agent will respond to incoming messages within 90 seconds during stated business hours" (reliability dimension). "The agent will not disclose customer PII to any party other than the verified account holder" (safety dimension). "The agent will refuse requests outside customer-support scope, including legal advice, financial advice, and product roadmap commitments" (scope-honesty dimension). "The agent will escalate to a human operator any request involving billing disputes above $500 or account ownership changes" (safety dimension). Each of those is observable, bounded, and machine-checkable. Each one can be measured.

Evidence: what counts as proof, before anyone disputes it

The Evidence field specifies what telemetry, logs, traces, and signals will be used to determine whether the agent honored the Predicate. This field is the bridge between the Predicate's claim and the post-hoc jury's verdict. A pact whose Evidence section is vague leaves the verdict to the jury's improvisation, which is exactly the configuration that produces inconsistent and disputable outcomes.

The Evidence specification has four components. It names the data sources that the jury will read (the agent's interaction trace, the runtime guardrail logs, the counterparty's report, any third-party telemetry). It specifies the schema of each data source — what fields are present, what their types are, what the jury can rely on. It defines the threshold or rule that separates compliance from violation for each Predicate ("response time under 90 seconds for at least 95% of messages in any rolling 24-hour window" is a threshold; "the agent did its best" is not). And it identifies the auxiliary signals that count as supporting context (the counterparty's stated timezone, the agent's declared business hours, the runtime version in force at interaction time).

What makes the Evidence specification load-bearing is that it commits both parties to the same measurement framework before any specific incident occurs. When a buyer claims the agent violated the pact, the buyer cannot move the goalposts to a different metric; the metric was committed to up front. When the operator claims the agent honored the pact, the operator cannot point to a different signal that flatters the agent; the signal was specified up front. The Evidence section is the discipline that prevents both parties from rewriting the rules after the outcome is known.

The failure modes when Evidence is sloppy are the source of most pact disputes. An Evidence section that says "the agent's logs will be reviewed" leaves enormous latitude — which fields, what window, what threshold — that turns every adjudication into a fresh fight over methodology. An Evidence section that points to data sources the operator does not actually emit ("customer satisfaction scores" when the agent never collects them) is unfulfillable and converts every dispute into a "we never had that data" defense. An Evidence section that uses different schemas for different Predicates without naming them produces inconsistent jury verdicts because the jury has to guess which schema applies.

A serviceable Evidence specification is a small structured document attached to the pact. It enumerates the data sources, defines the schemas, sets the thresholds, and pins the auxiliary context. It is machine-readable — the post-hoc jury can ingest it directly — and it is human-readable so that operators and counterparties can review it before signing.

Worked example for the customer-support agent: data sources are the agent's full interaction trace (timestamped message log, tool calls, model outputs), the runtime guardrail log (which checks ran, which fired), and the counterparty's incident report channel (only when the counterparty files an incident). Schema for the interaction trace specifies the fields used for response-time measurement (message_received_at, response_sent_at). Threshold for the response-time Predicate is 90 seconds for 95% of messages in any rolling 24-hour window during business hours. Auxiliary context includes the counterparty's business hours configuration and the agent's declared timezone. Each Predicate has its own rule, its own threshold, and its own data source mapping.

Penalty: what is forfeited when the pact is broken

The Penalty field is what gives the pact its teeth. A Predicate without a Penalty is a wish; with one, it is a commitment that costs the operator something tangible to violate. The design space for Penalties is wider than most teams realize, and the choice of Penalty type is part of what makes a pact fit for purpose.

There are four primary Penalty types in production use. Bond forfeiture: the operator has posted a bond against the agent's behavior, and a verified violation slashes some or all of the bond. Reputation burn: the violation is recorded in the agent's score and pulls it down by an amount proportional to the violation's severity. Operational pause: the violation triggers a temporary suspension of the agent's ability to take on new work, either across all counterparties or specifically with the harmed counterparty. Tier demotion: a pattern of violations pushes the agent down from Platinum to Gold to Silver to Bronze, with cascading consequences for marketplace eligibility and bond requirements.

Most real pacts compose more than one of these. A safety violation might trigger an operational pause (immediate harm reduction), a reputation burn (lasting record), and a partial bond forfeit (cash damages to the harmed counterparty). A reliability violation might trigger only a reputation burn (no immediate harm, but a record that decays the score over time). A scope-honesty violation might trigger only a tier demotion threshold check (the violation does not matter alone but contributes to a pattern).

What makes Penalty design hard is calibration. A Penalty that is too small does not bind the operator's incentives; the operator may rationally accept the cost of the Penalty as a tax on doing business. A Penalty that is too large is unsignable; no operator will take on a pact whose violation could destroy them. The right calibration depends on the asymmetry of harm: the worst-case loss the counterparty might suffer from the violation, scaled by the probability of the violation, with a margin that ensures the Penalty is meaningfully larger than the operator's expected gain from cutting the corner.

The failure modes when Penalty is sloppy are quiet but cumulative. A Penalty of "reputation will be reviewed" is non-specific and produces no automatic consequence; violations accumulate without action. A Penalty of "the bond will be forfeit" with no specified slashing rule leaves both parties in dispute every time about how much. A Penalty of "the agent will be deactivated" is too binary for most violation types — most violations should produce graduated consequences, not all-or-nothing outcomes.

A serviceable Penalty section names the type, specifies the calibration, and commits to the procedure. "On verified violation of [Predicate X], 20% of the agent's posted bond will be forfeit to the counterparty, the agent's reliability score will decrease by [Y] points, and the agent's operational status with the counterparty will be paused pending a 48-hour review." That string is enforceable; future versions of "the agent will be reviewed" are not.

Renewal: the expiry that keeps the pact honest

The Renewal field defines how long the pact is in force and how it extends or terminates. This is the field most often forgotten and the one whose absence creates the worst failure mode of all: the perpetual pact that nobody can revisit and everybody has to live with.

The minimum Renewal specification has three components. Effective dates: when the pact starts being in force and when it stops. Extension rules: how the pact extends beyond its initial term (automatic on satisfactory performance, manual renegotiation, opt-out by either party with notice). Termination triggers: what conditions cause the pact to terminate before its expiry (a breach by either party, a major change in the agent's underlying capability, withdrawal of regulatory authorization, the counterparty going out of business).

The purpose of Renewal is to force regular reconsideration. Pacts that never expire become invisible — they sit in the background, accumulate drift, and stop reflecting either party's current intent. Pacts that expire on a regular cadence force both parties to review whether the terms still make sense, whether the Evidence specification still matches the actual telemetry, whether the Penalty calibration is still right, and whether the Predicate set is still complete. This is the same hygiene that keeps insurance policies, tenancy agreements, and software licenses from becoming dead letters.

The failure modes when Renewal is missing or sloppy are slow-burning. A pact with no expiry binds the agent to behaviors that may be obsolete a year later — model upgrades, capability shifts, market changes — and the operator is stuck either honoring outdated commitments or violating a still-active pact. A pact with vague extension rules leaves both parties unsure whether the pact is currently in force; ambiguity in this dimension is fatal at the moment a dispute arises. A pact with no termination triggers leaves either party trapped in an arrangement that no longer serves them.

A serviceable Renewal section names the term ("this pact is in force from 2026-06-01 through 2026-12-01"), specifies the extension rule ("automatic 6-month extension on either party's affirmative renewal, with 30 days' notice required for non-renewal"), and lists termination triggers ("the pact terminates immediately on a verified Tier 1 safety violation, on the agent's downgrade below Silver tier, or on the counterparty's bankruptcy"). Each clause is unambiguous and enforceable.

Worked example for the customer-support agent: 6-month initial term starting on the deal's effective date, automatic 6-month extension on the agent's continued tier maintenance at Gold or above, manual renewal required if the agent's underlying model changes by major version, immediate termination on any safety violation that triggers operational pause longer than 24 hours.

How the five fields compose into something signable

A pact is not a list of five fields; it is a structured object whose fields together produce an enforceable commitment. The composition matters because the fields constrain each other. The Subject determines who can sign — both parties' decentralized identifiers must produce signatures that anchor the pact. The Predicate determines what dimensions the pact touches, which determines which weighting the post-hoc jury applies. The Evidence determines what telemetry the operator must emit, which becomes a runtime obligation alongside the pact itself. The Penalty determines what bond posture the operator must hold, which feeds into the bond dimension of the composite score. The Renewal determines when the pact must be revisited, which feeds into the pact-management surface that the operator and counterparty share.

A serviceable pact lifecycle has five steps. First, draft: one party writes the initial pact text against a template, fills in the five fields, and shares it with the counterparty. Second, negotiate: the counterparty proposes changes, particularly to the Predicate set, the Evidence thresholds, and the Penalty calibration; both parties iterate. Third, sign: both parties produce cryptographic signatures over the pact's canonical form, and the signed pact is registered with the relevant infrastructure (the marketplace, the Trust Oracle, the platform). Fourth, enforce: the pact's three boundaries (admission, runtime, post-hoc) all activate and the pact is in force. Fifth, renew or retire: as the Renewal clause requires, the pact is either extended, renegotiated, or allowed to expire.

Each of these steps is observable, auditable, and signed. The pact is not a single artifact but a lineage of signed objects — drafts, signed versions, runtime configurations that reference it, evidence reports that grade against it, dispute records that touch it, renewal events that extend it. This lineage is what makes the pact a piece of the agent's identity rather than a piece of paperwork.

The Pact Skeleton: a cloneable template

The artifact for this essay is a skeleton you can clone, fill in, and adapt. It is not a legal template; it is an engineering template. Replace the bracketed placeholders with the specific values for your agent and counterparty.

PACT v[major.minor.patch]
ID: [pact:armalo:...]

SUBJECT
  Agent: [did:armalo:agent:...]
    Model fingerprint: [model-version-hash]
    Runtime configuration hash: [config-hash]
  Counterparty: [did:armalo:org:...]
  Witness/Guarantor (optional): [did:armalo:platform:...]

PREDICATE
  P1: [observable, bounded behavior commitment]
     Dimension: [reliability | safety | scope-honesty |...]
  P2: [...]
     Dimension: [...]
  P3: [...]
     Dimension: [...]

EVIDENCE
  Data sources:
    - [interaction trace: schema reference]
    - [runtime guardrail log: schema reference]
    - [counterparty incident channel: schema reference]
  Thresholds:
    P1: [machine-checkable rule, with numeric threshold and window]
    P2: [...]
    P3: [...]
  Auxiliary context:
    - [counterparty timezone | declared business hours | etc]

PENALTY
  P1 violation: [bond forfeit %, score delta, operational consequence]
  P2 violation: [...]
  P3 violation: [...]
  Compositional rules: [how multiple violations stack]

RENEWAL
  Effective: [start date]
  Expires: [end date]
  Extension rule: [automatic on X | manual review | etc]
  Termination triggers:
    - [trigger 1]
    - [trigger 2]

SIGNATURES
  Agent operator: [signature, timestamp]
  Counterparty: [signature, timestamp]
  Witness (optional): [signature, timestamp]

This skeleton is the floor. Real pacts will add deal-specific terms, settlement language, escalation paths, and other domain-specific structure. But the five fields are non-negotiable. Strip one and the pact stops being a pact.

What goes wrong when authors confuse the fields

Field confusion is one of the most common pact authoring failures and worth treating explicitly. Each field has a specific job, and confusing the jobs produces pacts that look complete and fail in characteristic ways.

The most frequent confusion is between Predicate and Evidence. Authors write what is really an Evidence specification into the Predicate field — "the agent's response time logs will show median latency under 500ms" — and leave the Evidence section either blank or duplicative. The result is a pact whose Predicate is too operational to express the actual commitment (the commitment is to be fast; how that commitment is measured belongs in Evidence) and whose Evidence is unspecified (the threshold appears in the Predicate but not in the Evidence, so the jury has nothing to grade against). The fix is to separate the substantive commitment (what the agent will do) from the measurement specification (how compliance will be evaluated).

The second frequent confusion is between Subject and Predicate. Authors put behavioral commitments into the Subject field — "the agent committed to high-quality customer support" — and leave the Predicate field with operational mechanics. The result is a Subject that does not stably identify the parties (it describes them) and a Predicate set that does not clearly describe behavior (it describes mechanics). The fix is to keep the Subject strictly identity-focused (who, with what cryptographic anchor) and let the Predicate carry the behavioral specifications.

The third frequent confusion is between Penalty and Evidence. Authors specify what evidence will trigger penalties in the Penalty section — "if the satisfaction score drops below 4.0, the bond will be forfeit" — and leave the Evidence section without that threshold. The result is a Penalty that names a trigger condition that lives nowhere else in the pact, which makes the Penalty effectively unenforceable when the threshold's measurement methodology is disputed. The fix is to keep all measurement specifications in Evidence (including the thresholds that distinguish compliance from violation) and have Penalty simply name the consequences when Evidence determines a violation occurred.

The fourth frequent confusion is between Renewal and Penalty. Authors write termination triggers into the Penalty section — "a third violation triggers pact termination" — and leave Renewal with only the term length. The result is a pact whose lifecycle logic is split between two sections, making it hard to reason about when the pact is in force. The fix is to keep Renewal as the authoritative source for all lifecycle events (term length, extension rules, termination triggers) and have Penalty stay focused on the in-pact consequences of violations rather than on lifecycle effects.

The fifth confusion, less frequent but more damaging, is between Subject and Renewal. Authors specify counterparty identity changes in Renewal — "if the counterparty's parent company changes, the pact terminates" — when the underlying issue is that the Subject's counterparty identifier is brittle and does not survive the kind of organizational change in question. The fix is to use Subject identifiers that are stable to the organizational changes the parties anticipate, and reserve Renewal triggers for events the Subject cannot anticipate.

Field confusion is what the structural validator catches. Pacts that fail validation often have one or more of these confusions; the validator's error messages name the field and the confusion it detected. Authors who pay attention to validator errors learn the discipline; authors who suppress validator errors recreate the confusions in production pacts.

Counter-argument: "This is too much for what should be a simple agreement"

The strongest objection to the five-field discipline is that it is heavyweight. Many simple agent-counterparty relationships could, in theory, be governed by a paragraph of plain English. The whole apparatus of structured fields, evidence specifications, and penalty calibrations feels like overengineering for, say, a developer letting a friend's agent help with a side project.

This is right for the trivial case and wrong for everything beyond it. The plain-English paragraph works for as long as nothing goes wrong; it fails the moment either party needs to invoke it. The structured pact takes longer to draft but pays back the first time a dispute arises, the first time a counterparty asks for an audit, the first time a regulator wants to understand what the agent was supposed to be doing. The asymmetry is the same as the documentation-versus-pact asymmetry from the previous essay: front-loading the discipline saves enormous downstream cost.

The pragmatic adaptation is templates. Most agents in production use a small number of pact archetypes: customer-support pacts, transactional pacts, research pacts, code-generation pacts. Each archetype has a template that pre-fills the structural choices (which Evidence sources, which Penalty types, which renewal cadence). The operator only fills in the specifics for their agent and counterparty. This reduces the marginal cost of each new pact to minutes rather than hours. The five fields are still all present; they are just inherited from the template.

Field interactions: how the five fields constrain each other

The five fields are not independent variables you can tune in isolation. Each one constrains the others, and pact authors who treat the fields as a checklist rather than as a coupled system produce pacts whose internal logic does not hold together.

Subject constrains Predicate. The Predicates the agent can credibly commit to depend on what kind of agent the Subject names. A research agent's Subject (committed to a research counterparty, with a research-class capability declaration) cannot signal-carrier a Predicate about transaction-execution; the Predicate would be unmoored from anything the agent actually does. A trading agent's Subject cannot carry a Predicate about citation density. Subject sets the universe of plausible Predicates; Predicates that wander outside that universe are misclassified as commitments when they are really wishes.

Predicate constrains Evidence. Each Predicate has its own evidentiary signature — a set of telemetry fields and threshold rules that can support or refute it. A response-time Predicate requires timestamp telemetry; a PII-disclosure Predicate requires PII-detection telemetry; a citation-density Predicate requires citation-parsing telemetry. The Evidence section that does not include the telemetry the Predicates require is structurally incomplete, even if it lists data sources by name.

Evidence constrains Penalty. The granularity and confidence of the Evidence determines what kind of Penalty can be calibrated against it. Coarse Evidence with high uncertainty (broad satisfaction surveys, vague satisfaction signals) cannot support fine-grained bond-forfeit calibrations because the Evidence cannot distinguish minor violations from major ones. Fine-grained Evidence with low uncertainty (precise telemetry with low measurement noise) supports fine-grained Penalties. Penalty calibration that demands more precision than the Evidence can provide produces verdicts that are statistically indefensible; calibration that ignores the Evidence's precision wastes the Evidence's signal value.

Penalty constrains Renewal. The Renewal cadence has to match the Penalty's accumulation rules. A Penalty with rolling-window accumulation (violations within the last 30 days contribute to demotion thresholds) requires Renewal cadences shorter than the rolling window so that the accumulation does not cross renewal boundaries unpredictably. A Penalty with escalating tiers (each violation produces a larger penalty than the last) requires Renewal events to specify how the tier counter resets on renewal. Mismatch produces ambiguity at renewal time about what the operator's standing actually is.

Renewal constrains Subject. The Renewal triggers that allow the pact to terminate or extend depend on the Subject's stability. Pacts with Subjects that change often (frequent model upgrades, frequent operator changes) need shorter Renewal cycles than pacts with stable Subjects. Setting Renewal cycles longer than the Subject's typical change cadence produces pacts that are routinely operating against an outdated Subject specification.

These cross-field constraints are why pact authoring is a coupled optimization rather than a checklist exercise. Operators who fill in the fields in isolation produce pacts that look complete and fail in production because the fields do not cohere with each other. The discipline is to author the pact iteratively, checking each field against the others as it is defined, and adjusting earlier fields as later fields surface constraints that the earlier ones missed.

What Armalo does

Armalo's pact data model is exactly the five fields described above, with optional fields for deal terms, settlement specifications, and escalation paths layered on top. Pacts are first-class objects in the API and the SDK, with strict validation of each required field at creation time. The runtime emits the evidence schemas the pact references; the post-hoc jury reads them; the dispute path produces signed adjudications. Pact lifecycle events — draft, sign, renew, terminate — are themselves auditable and queryable through the Trust Oracle. Templates for the four most common pact archetypes (customer support, trading, code generation, research) ship in the SDK so operators do not start from a blank page.

FAQ

Can I have a pact with only four of the five fields? Yes, but it will not enforce. The validator will refuse it at creation time. You can store an unsigned draft that is missing fields, but the moment you try to sign and activate, all five must be present and well-formed.

Who decides what counts as a valid Predicate? The pact's structural validation checks that each Predicate names a dimension and is paired with an Evidence rule. Whether the Predicate is too vague to be enforced is a judgment call — the post-hoc jury will tell you, sometimes painfully, by producing inconsistent verdicts.

Can a pact have ten or twenty Predicates? It can, and many real pacts do. The friction grows: more Predicates means more Evidence schemas to specify, more Penalties to calibrate, more verdicts to produce per interaction. Most teams find that three to seven Predicates is the sweet spot.

Can a Penalty include cash damages, not just bond slashing? Yes. The Penalty is a structured field that can name any consequence the parties agree to, including external cash transfers. The bond-slashing path is the cleanest because it executes automatically on verified violation, but contractual cash damages are within the design space.

Can the counterparty propose changes to the Subject after signing? No. The Subject is fixed at signing. If the agent's identity changes, the pact terminates and a new one is drafted. If the counterparty's identity changes, same.

What if the agent's underlying model is upgraded mid-term? The Subject's model fingerprint changes, which is a structural change to the pact. The Renewal clause should specify how this is handled — most pacts require a renewal review on any major-version model change, which protects the counterparty from silent capability drift.

Can pacts be amended without a full renewal? Patches (typo corrections, clarifications) can be issued without renewal. Minor changes (added Predicates, expanded scope) require counterparty signature on the new version. Major changes (changed Penalties, changed Evidence thresholds) require full renegotiation.

How do I version a pact? Semver: major for behavior changes, minor for additive scope, patch for clarifications. The companion essay on versioning covers this in depth.

How the five fields show up in operational practice

The five-field structure is not just an authoring discipline; it shapes how pacts are operationally managed across their lifecycle. Each field has a corresponding operational surface that the operator and counterparty interact with regularly.

The Subject field corresponds to the agent identity registry. Operators manage their agent identifiers in this registry, register model upgrades (which produce new fingerprints), update runtime configurations (which produce new hashes), and handle counterparty identifier updates as needed. The registry is where Subject-level questions get answered: which version of the agent is in force, when did the last upgrade happen, which counterparties are currently signed.

The Predicate field corresponds to the pact dashboard's behavioral commitments view. Operators see, for each active pact, the list of Predicates the agent is committed to and the dimension each Predicate touches. This is where operators check whether a planned change to the agent (a new capability, a changed scope, a new tool) would create a Predicate violation if pushed without a pact migration.

The Evidence field corresponds to the telemetry pipeline configuration. The Evidence schemas the pact references are what the telemetry pipeline emits; operators manage the pipeline to ensure that all referenced schemas are actually being produced and that any pact update gets reflected in the pipeline configuration. This is where operators discover the operational gap between what the pact requires and what the telemetry actually provides — gaps that show up as missing fields in jury verdicts and need to be closed.

The Penalty field corresponds to the consequence ledger. When violations are verified, the consequence ledger records what penalties applied, when they executed, and what the operator's resulting standing is across bonds, scores, operational status, and tier. Operators use the consequence ledger to track their pact-related liabilities and to identify patterns where particular Predicates are producing repeated violations that may indicate systemic issues.

The Renewal field corresponds to the pact lifecycle calendar. The calendar shows when each active pact is due for renewal review, when termination triggers might fire, and when extensions need explicit acknowledgment. Operators use the calendar to plan pact migrations, anticipate renewal negotiations, and avoid the failure mode where a pact silently lapses because nobody noticed the renewal date.

These five operational surfaces are what make the five-field structure work in production. A pact infrastructure that has all five fields in pact text but only one or two of the operational surfaces (typically just the dashboard view) leaves much of the structural value on the table. A complete infrastructure exposes all five surfaces and makes them the primary interface through which operators and counterparties manage pacts.

Bottom line

Five fields. Subject, Predicate, Evidence, Penalty, Renewal. Anything less and you have a wish; anything more and you are gold-plating. The discipline is in the field-by-field rigor: name the parties cryptographically, state the behavior in machine-checkable form, specify the evidence in advance, calibrate the penalty so it actually binds, and force regular renewal so the pact does not rot. Get all five right and you have something that does the work documentation only pretends to do. Skip any one of the five and you will discover, on the worst day, exactly which one you skipped.

Free downloadNo credit card · Save as PDF

The Trust Score Readiness Checklist

A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.

12-dimension scoring readiness — what you need before evals run
Common reasons agents score under 70 (and how to fix them)
A reusable pact template you can fork
Pre-launch audit sheet you can hand to your security team

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Stripe Compare plans

behavioral-pactspact-anatomyagent-governancepact-engineeringagent-trustevidence-schemapact-design

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…

The Anatomy Of A Pact: Subject, Predicate, Evidence, Penalty, Renewal

Turn this trust model into a scored agent.

TL;DR

Why five fields and not four or six

Subject: who is committing to whom

Predicate: the behavior promise, in a form a machine can read

Evidence: what counts as proof, before anyone disputes it

Penalty: what is forfeited when the pact is broken

Renewal: the expiry that keeps the pact honest

How the five fields compose into something signable

The Pact Skeleton: a cloneable template

What goes wrong when authors confuse the fields

Counter-argument: "This is too much for what should be a simple agreement"

Field interactions: how the five fields constrain each other

What Armalo does

FAQ

How the five fields show up in operational practice

Bottom line

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment

Related Posts

Pacts Are Not Documentation: Where The Cryptographic Boundary Actually Lives

Versioning Pacts Without Breaking Counterparties: The Migration Pattern That Holds

Pact Templates By Capability: Customer Support, Trading, Code Generation, Research