Pact Templates By Capability: Customer Support, Trading, Code Generation, Research
One pact template doesn't fit all agents. The four capability-specific templates β customer support, trading, code generation, research β with field-by-field commentary on what makes each different and four ready-to-clone skeletons.
Continue the reading path
Topic hub
Behavioral ContractsThis page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
TL;DR
A pact for a customer-support agent and a pact for a trading agent are both pacts in the same five-field structural sense, but the contents of those five fields are wildly different. Customer support emphasizes scope-honesty and safety; trading emphasizes bond, latency, and on-chain settlement evidence; code generation emphasizes correctness, attribution, and security review; research emphasizes citation density, source fidelity, and confidence calibration. Trying to use one template for all four is the source of more pact failures than any other single mistake. This essay is the working library: four capability-specific templates with field-by-field commentary on what makes each different, plus four ready-to-clone skeletons.
Why one template does not fit all four
The temptation to standardize on a single pact template is enormous. Every operator wants reusability; every platform wants to ship one canonical pact form that all agents can adopt; every legal team wants the convenience of reviewing one shape rather than four. The temptation produces pacts that look uniform on the surface and fail the moment they meet capability-specific reality.
The reasons each capability needs its own template are concrete. Customer support agents handle inputs from end users, which means scope drift, PII risk, and tone are first-order concerns; the pact has to express commitments around topic boundaries, refusal patterns, and disclosure constraints in ways that other capability classes do not. Trading agents handle financial value, which means latency budgets, slippage tolerance, position-size limits, and on-chain settlement become primary; the pact has to express commitments in dollars and basis points that customer-support pacts have no language for. Code generation agents produce artifacts that get committed to other systems, which means provenance, license compliance, security scanning, and review thresholds become primary; the pact has to express commitments around what gets shipped to whom under what attribution. Research agents produce claims that other systems will rely on, which means citation requirements, confidence calibration, and source-fidelity guarantees become primary; the pact has to express commitments around how claims are sourced and qualified.
These are not stylistic differences. They are substantive differences that require different Predicates, different Evidence schemas, different Penalty calibrations, and different Renewal cadences. A trading pact's Penalty calibration has to be tied to dollar exposure; the same calibration for a customer-support pact would be both overkill and unhelpful. A research pact's Evidence schema has to specify citation parsing rules; the same schema for a code-generation pact would not parse the right thing. The structural skeleton (Subject, Predicate, Evidence, Penalty, Renewal) is universal, but the contents are domain-specific.
The right approach is a small library of capability-specific templates, each with the structural skeleton pre-filled with the patterns that work for that capability. Operators clone the template that matches their agent, fill in the specifics, and ship a pact that already incorporates the dimension-specific best practices. The remainder of this essay is that library.
Template one: customer support pact
The customer-support pact governs an agent that handles end-user inquiries on behalf of a counterparty (typically a business with customers). The dominant concerns are scope drift (the agent handling things outside its authorized capability), safety (PII disclosure, harmful advice), reliability (responsiveness, consistency), and tone (the agent representing the counterparty's brand voice).
The Subject section in a customer-support pact has the standard agent and counterparty identifiers, and additionally identifies the end-user class the agent will be serving. "Verified account holders of [counterparty]" is a typical end-user class. This matters because the pact's commitments may differ for different end-user classes β an account holder gets account information; a non-account-holder does not, even if they ask politely.
The Predicate section in a customer-support pact has three or four standard Predicates that recur across virtually every customer-support pact in production. First, a scope-honesty Predicate: the agent will refuse requests outside its authorized scope, where scope is enumerated explicitly (billing, account management, product usage questions; not legal advice, financial advice, product roadmap commitments, comparisons to competitors). Second, a safety Predicate around PII: the agent will not disclose customer data to any party other than the verified account holder, with verification specified by mechanism. Third, a reliability Predicate around responsiveness: typical commitment is response within a defined window (60-120 seconds is common) for a defined fraction of interactions (95% or 99% are typical) within stated business hours. Fourth, often a Predicate around escalation: the agent will route specific categories (billing disputes above a threshold, account ownership changes, complaint patterns) to a human operator rather than handling them autonomously.
The Evidence section in a customer-support pact has its own characteristic shape. The data sources include the agent's interaction trace (full message log with timestamps), the runtime guardrail log (which scope checks ran, which PII filters fired), the counterparty's ticket system (where the agent's interactions are recorded against customer accounts), and end-user satisfaction signals when available. Schemas have to specify how PII is detected (regex patterns for the obvious cases, model-based detection for the subtle cases), how scope is classified (topic taxonomy mapping), and how response time is measured (clock starts at message-received-at, stops at first-substantive-response-sent-at, ignores acknowledgment messages). Thresholds are operationally specific (95% response within 90 seconds; zero PII disclosures to non-verified parties; 99% scope refusal rate on out-of-scope inquiries).
The Penalty section in a customer-support pact tends to be reputation-burn-heavy and bond-light. Customer-support violations rarely cause cash damages directly; they cause reputational damage (the counterparty looks bad to its customers) and operational risk (PII disclosure has regulatory consequences). The composite score's safety dimension (11%) and reliability dimension (13%) carry the weight; bond is typically present but small. Operational pause is calibrated for the harm class β a PII violation triggers immediate pause; a scope drift triggers no pause but accumulates toward tier demotion.
The Renewal cadence for customer-support pacts is typically 6 months with automatic extension on tier maintenance. Customer-support pacts are relatively stable contracts that do not need frequent revisitation; the model upgrades every six months at most, the counterparty's needs change slowly, and the cost of frequent renegotiation is high. The exception is when the counterparty is in a regulated industry (healthcare, finance, education) where regulatory changes may require pact updates more frequently.
The ready-to-clone customer-support pact skeleton:
PACT v1.0.0 (customer-support)
ID: pact:armalo:[your-id]
SUBJECT
Agent: did:armalo:agent:[your-agent]
Counterparty: did:armalo:org:[counterparty]
End-user class: verified account holders of [counterparty]
Witness: did:armalo:platform:armalo-marketplace
PREDICATE
P1 (scope-honesty): Agent will refuse inquiries outside enumerated scope:
IN-SCOPE: billing inquiries, account management, product usage,
technical support for documented features
OUT-OF-SCOPE: legal advice, financial advice, product roadmap,
competitor comparisons, off-topic conversation
Refusal rate target: 99% on detected out-of-scope inquiries
P2 (safety/PII): Agent will not disclose customer PII to any party other
than the verified account holder. Verification is via [mechanism].
P3 (reliability): Agent will respond to incoming messages within 90 seconds
for 95% of messages received during stated business hours.
P4 (safety/escalation): Agent will route to human operator any:
- billing disputes above $500
- account ownership change requests
- complaint escalations marked by the customer or the agent
EVIDENCE
Data sources:
- Agent interaction trace (full message log, timestamped)
- Runtime guardrail log (PII filter results, scope classifier results)
- Counterparty ticket system (interaction-to-account mapping)
Schemas:
P1 evidence: scope classifier output per inquiry, refusal rationale per refusal
P2 evidence: PII detector output per outbound message, verification status of recipient
P3 evidence: message_received_at, response_sent_at, business_hours flag
P4 evidence: escalation routing log per qualifying inquiry
Thresholds:
P1: refusal rate >= 99% on inquiries classified out-of-scope
P2: zero PII disclosures to unverified parties; tolerance is exactly zero
P3: 95th percentile response time <= 90 seconds during business hours
P4: 100% escalation rate on inquiries matching escalation criteria
PENALTY
P1 violation: -3 reliability score, +1 toward tier-demotion threshold
P2 violation: immediate operational pause, full bond forfeit, -10 safety score
P3 violation (per rolling week below threshold): -2 reliability score
P4 violation: -5 safety score, +2 toward tier-demotion threshold
Composition: P2 violations cannot be offset by other dimensions; immediate pause holds
RENEWAL
Effective: [start date]
Expires: [start date + 6 months]
Extension: automatic 6-month extension on Gold or higher tier maintenance
Termination triggers:
- any P2 violation
- tier demotion below Silver
- regulatory change affecting in-scope topics (notice required)
SIGNATURES
[operator, counterparty, witness]
Template two: trading pact
The trading pact governs an agent that executes financial transactions on behalf of a counterparty. The dominant concerns are bond posture (skin in the game proportional to position size), latency (markets move; slow execution costs money), slippage tolerance (agent must execute within bounded price movement), position-size limits (no concentration risk beyond authorized exposure), and on-chain settlement evidence (every transaction has cryptographic provenance).
The Subject section identifies the agent, the counterparty (typically the principal whose capital is being deployed), the venues authorized for trading, the asset classes authorized, and any custodian or settlement layer. Trading pacts often have multiple counterparties β the principal, the venue, the settlement layer β and the Subject section has to name all of them clearly.
The Predicate section in a trading pact has its own characteristic shape, dominated by quantitative bounds. First, a position-size Predicate: the agent will not hold a single-asset position exceeding [X] dollars or [Y]% of total deployed capital. Second, a latency Predicate: the agent will execute time-sensitive instructions within [Z] milliseconds of the trigger event, where Z is calibrated to the venue's typical latency profile. Third, a slippage Predicate: the agent will not execute trades whose realized slippage exceeds [W] basis points relative to the mid-quote at signal time. Fourth, a venue-discipline Predicate: the agent will execute only on the authorized venue list. Fifth, often a stop-loss Predicate: the agent will close positions whose mark-to-market loss exceeds a defined threshold. Sixth, a settlement Predicate: the agent will produce on-chain settlement evidence for every transaction, with the transaction hash logged in the pact's evidence stream within [N] blocks of execution.
The Evidence section in a trading pact has venue-specific data sources (order books, fill records, on-chain event logs), settlement-specific data sources (transaction hashes, block confirmations, settlement amounts in stable units), and risk-specific data sources (real-time position snapshots, mark-to-market valuations, risk system outputs). Schemas have to specify how slippage is computed (mid-quote at what timestamp, fill price at what timestamp, conversion to basis points), how positions are tracked across venues (what counts as a single position), and how settlement is verified (transaction hash, block number, confirmation count).
The Penalty section in a trading pact is bond-heavy. Trading violations cause cash damages directly; the pact's Penalty has to make those damages recoverable. Position-size violations trigger partial bond forfeit proportional to the breach. Slippage violations trigger bond slashing proportional to the realized slippage cost. Stop-loss violations trigger immediate pause and full position liquidation. The composite score's bond dimension (8%) carries weight, but the more important factor in trading pacts is that bond posture is the operator's collateral against the counterparty's capital β the bond has to be large enough to make the operator's commitment credible.
The Renewal cadence for trading pacts is typically shorter than other pacts β quarterly or even monthly β because market conditions, venue rules, and risk parameters change frequently. The pact often has explicit renegotiation triggers around volatility regimes, venue policy changes, and capital changes. Long-tenor trading pacts tend to develop misalignments that the parties want to address through renegotiation; the pact's Renewal clause should reflect this.
The ready-to-clone trading pact skeleton:
PACT v1.0.0 (trading)
ID: pact:armalo:[your-id]
SUBJECT
Agent: did:armalo:agent:[your-agent]
Principal: did:armalo:org:[counterparty]
Venues authorized: [enumerate by name and identifier]
Asset classes authorized: [enumerate]
Settlement layer: [chain or custody specification]
Witness: did:armalo:platform:armalo-marketplace
PREDICATE
P1 (position-size): Agent will not hold a single-asset position exceeding
$[X] notional or [Y]% of deployed capital, whichever is lower.
P2 (latency): Agent will execute time-sensitive instructions within
[Z] ms of trigger; venues with native latency exceeding Z are excluded.
P3 (slippage): Agent will not execute trades whose realized slippage
exceeds [W] basis points relative to mid-quote at signal time.
P4 (venue-discipline): Agent will execute only on the authorized venue list.
P5 (stop-loss): Agent will close any position whose mark-to-market loss
exceeds [V]% of the position's initial notional.
P6 (settlement): Agent will produce on-chain settlement evidence for every
transaction within [N] blocks of execution, with transaction hash logged
in the pact's evidence stream.
EVIDENCE
Data sources:
- Venue fill records (order ID, fill price, fill timestamp, fill size)
- On-chain settlement events (tx hash, block number, settlement amount)
- Risk system snapshots (position-by-asset, mark-to-market, capital used)
- Signal logs (trigger event, signal timestamp, mid-quote at trigger)
Schemas:
P1: position-by-asset snapshot per minute; size in notional and %-of-capital
P2: signal_timestamp, execution_timestamp, latency = execution - signal
P3: signal_mid_quote, fill_price, slippage_bps = (fill - mid) / mid * 10000
P4: venue_id per fill, must be in authorized list
P5: position_pnl_pct per minute; threshold breach triggers close-by-N
P6: tx_hash per fill, block_confirmation_count, log latency from execution
Thresholds:
P1: zero violations; tolerance is exactly zero
P2: 99th percentile latency <= Z ms
P3: 99th percentile slippage <= W bps
P4: 100% on-venue execution
P5: 100% close-execution within N seconds of threshold breach
P6: 100% evidence within N blocks; gaps trigger investigation
PENALTY
P1 violation: bond forfeit proportional to overage; size-and-duration weighted
P2 violation: bond forfeit calibrated to opportunity cost of latency miss
P3 violation: bond forfeit equal to realized slippage cost above threshold
P4 violation: full bond forfeit, immediate pause, mandatory review
P5 violation: full position liquidation; bond forfeit for execution lag
P6 violation: -8 bond score, -3 reliability score, escalating per occurrence
Composition: P4 and P5 violations cannot be offset; immediate pause holds
RENEWAL
Effective: [start date]
Expires: [start date + 3 months]
Extension: monthly extension on principal acknowledgment
Termination triggers:
- any P4 violation
- cumulative bond forfeit > [threshold]
- venue policy change affecting authorized venues
- principal capital withdrawal or addition above [threshold]
SIGNATURES
[operator, principal, witness]
Template three: code generation pact
The code-generation pact governs an agent that produces source code (or other generated artifacts) that gets committed to other systems. The dominant concerns are correctness (the code does what it claims), provenance (the code's authorship is honestly attributed), license compliance (third-party code is properly licensed), security review (vulnerabilities are caught before commit), and review thresholds (changes above a complexity bar require human review).
The Subject section identifies the agent, the counterparty (typically the codebase owner), the repositories the agent is authorized to commit to, the branches authorized, and any reviewer pool the agent escalates to. Code-generation pacts often distinguish between repositories with different stakes β production code, internal tooling, documentation β with different commitments per class.
The Predicate section in a code-generation pact has commitments around the artifact, the process, and the review path. First, a correctness Predicate: the agent will produce code that passes all defined tests, type-checks, and lints before commit; failures roll back the commit. Second, an attribution Predicate: the agent will mark all generated code as agent-generated in commit messages and PR descriptions, never falsely claiming human authorship. Third, a license Predicate: the agent will not incorporate third-party code that violates the repository's license policy; suspicious patterns trigger automated license scanning. Fourth, a security Predicate: the agent will not commit code that fails the repository's security scanner; high-severity findings block commit. Fifth, a review-threshold Predicate: changes above a defined complexity bar (lines changed, files touched, security-sensitive surfaces) require human review before commit. Sixth, often a scope Predicate: the agent will commit only to authorized repositories and branches, never bypassing protected branches.
The Evidence section in a code-generation pact has CI-system data sources (test results, lint output, type-check output), commit metadata (authorship, message, timestamp, signed status), security-scanner outputs (per-commit findings with severity), and review-system data (PR creation, reviewer assignment, review outcome). Schemas have to specify how CI results are parsed, how attribution is verified (commit message format, signed-off-by lines, GPG signatures), and how review thresholds are computed (line counts after exclusions, file-touch sets, security surface classification).
The Penalty section in a code-generation pact distributes across reputation burn (incorrect code damages the operator's score), operational pause (security violations trigger immediate revocation of commit access), and bond forfeit (license violations have potential cash damages from rights holders). The composite score's accuracy dimension (14%) carries weight, as does scope-honesty (7%) and security (8%). Tier demotion is calibrated to pattern: a single bug is recoverable; repeated security findings push the agent down tiers.
The Renewal cadence for code-generation pacts is typically tied to the codebase's release cadence β quarterly is common for stable codebases, monthly for fast-moving ones. The pact often references the repository's policy documents (style guide, security policy, contribution guidelines) so that updates to those policies trigger pact review.
The ready-to-clone code-generation pact skeleton:
PACT v1.0.0 (code-generation)
ID: pact:armalo:[your-id]
SUBJECT
Agent: did:armalo:agent:[your-agent]
Counterparty: did:armalo:org:[counterparty]
Repositories authorized: [enumerate]
Branches authorized per repo: [enumerate]
Reviewer pool: did:armalo:org:[counterparty]/team-name
Witness: did:armalo:platform:armalo-marketplace
PREDICATE
P1 (correctness): Agent will produce code that passes all defined tests,
type-checks, and lints before commit. Failures block commit.
P2 (attribution): Agent will mark all generated code as agent-generated
in commit messages with the standard tag. No false human authorship.
P3 (license): Agent will not incorporate third-party code that violates
the repository's license policy. Automated license scan must pass.
P4 (security): Agent will not commit code that fails the repository's
security scanner at high or critical severity.
P5 (review-threshold): Changes above [N] lines, touching [M] files, or
modifying security-sensitive paths require human review before commit.
P6 (scope): Agent will commit only to authorized repositories and branches,
never to protected branches without explicit human approval.
EVIDENCE
Data sources:
- CI system results (test, lint, type-check, build) per commit
- Commit metadata (author, message, signed status, timestamp)
- Security scanner output per commit (findings with severity)
- License scanner output per commit (third-party code, license matches)
- PR system data (creation, reviewers, approval, merge)
Schemas:
P1: ci_status per commit; pass/fail per check
P2: commit_message contains [agent-generated tag]; verified per commit
P3: license_scan_result per commit; pass/fail/warnings
P4: security_scan_findings per commit; severity-classified
P5: change_complexity per PR (lines, files, security-surface flag)
P6: target_branch per commit; cross-check against authorized list
Thresholds:
P1: 100% CI pass before commit; failures rolled back
P2: 100% attribution tag presence; missing tag triggers PR rejection
P3: zero license violations; tolerance is exactly zero
P4: zero high-or-critical security findings; tolerance is exactly zero
P5: 100% review-threshold compliance; bypass attempts logged
P6: 100% on-authorized-branch commits
PENALTY
P1 violation: -4 accuracy score per occurrence; commit reverted
P2 violation: -2 scope-honesty score per occurrence; PR amended
P3 violation: bond forfeit proportional to license risk; legal review
P4 violation: immediate pause on commit access; security review required
P5 violation: -3 reliability score; pattern triggers tier review
P6 violation: immediate pause on commit access; full bond forfeit
Composition: P4 and P6 violations cannot be offset; immediate pause holds
RENEWAL
Effective: [start date]
Expires: [start date + 3 months]
Extension: automatic 3-month extension on continued tier maintenance
Termination triggers:
- any P4 or P6 violation
- repository policy update that materially changes scope
- tier demotion below Silver
SIGNATURES
[operator, counterparty, witness]
Template four: research pact
The research pact governs an agent that produces claims, summaries, or analyses that other systems or humans will rely on. The dominant concerns are citation density (every non-trivial claim is sourced), source fidelity (citations point to real documents that actually support the claim), confidence calibration (the agent expresses appropriate uncertainty), scope honesty (the agent acknowledges the limits of its analysis), and freshness (claims are current rather than stale).
The Subject section identifies the agent, the counterparty (typically the consumer of the research output), the domain authorized (the agent will research [topic class] but not [other topic class]), and any authoritative source list the agent must consult. Research pacts often constrain the source list more tightly than other pact classes because research quality is bottlenecked on source quality.
The Predicate section in a research pact has commitments around the output's evidentiary character. First, a citation-density Predicate: the agent will provide at least one verifiable citation for every non-trivial claim, where non-trivial is operationalized (e.g., any claim involving a specific number, date, named entity, or causal assertion). Second, a source-fidelity Predicate: every citation will point to a real document that materially supports the cited claim; citations to documents that do not exist or do not support the claim are violations. Third, a confidence Predicate: the agent will express confidence levels appropriately, using calibrated language for uncertain claims and reserving definitive language for well-supported ones. Fourth, a scope Predicate: the agent will refuse questions outside its authorized domain rather than hallucinating answers; explicit acknowledgment of scope limits is required. Fifth, a freshness Predicate: the agent will note the freshness of its sources for time-sensitive claims, and refuse claims whose supporting sources are older than a defined threshold for the topic class.
The Evidence section in a research pact has its own characteristic shape. Data sources include the agent's interaction trace (full prompt and response), the agent's tool calls (especially document retrieval), the source documents the agent cited (with hashes for stability), and any reviewer feedback if humans are spot-checking outputs. Schemas have to specify how citations are detected and parsed (citation format, document identifier, page or section reference), how source-fidelity is verified (the cited document must contain text that supports the claim, validated by retrieval and comparison), and how confidence calibration is measured (calibration over a held-out test set of claims with known truth values).
The Penalty section in a research pact emphasizes accuracy and scope-honesty. The composite score's accuracy dimension (14%) carries the most weight here. Citation violations are the most serious β fabricated citations are the canonical research failure and pattern violations should trigger tier demotion quickly. Scope drift (the agent answering questions outside its authorized domain) is also weighted heavily because research outputs that exceed the agent's actual competence are how end-users get burned. Bond forfeit is moderate; reputation burn is the primary mechanism.
The Renewal cadence for research pacts depends on the domain's pace of change. Slow-moving domains (mathematics, history) can support 12-month pacts; fast-moving domains (technology, regulatory landscape) need 3-month or 6-month pacts so the source freshness specifications stay current.
The ready-to-clone research pact skeleton:
PACT v1.0.0 (research)
ID: pact:armalo:[your-id]
SUBJECT
Agent: did:armalo:agent:[your-agent]
Counterparty: did:armalo:org:[counterparty]
Domain authorized: [enumerate topics]
Authoritative source list (optional): [enumerate or reference catalog]
Witness: did:armalo:platform:armalo-marketplace
PREDICATE
P1 (citation-density): Agent will provide at least one verifiable citation
for every non-trivial claim. Non-trivial claims include: specific
numbers, dates, named entities, causal assertions, comparative claims.
P2 (source-fidelity): Every citation will point to a real document that
materially supports the cited claim. Fabricated citations or citations
that misrepresent source content are violations.
P3 (confidence): Agent will express appropriate confidence:
- definitive language reserved for well-supported claims
- hedged language for moderately supported claims
- explicit uncertainty acknowledgment for poorly supported claims
- refusal for claims with no adequate support
P4 (scope): Agent will refuse questions outside its authorized domain.
Explicit acknowledgment of scope limits is required; hallucinated
out-of-scope answers are violations.
P5 (freshness): Agent will note source freshness for time-sensitive claims
and refuse claims whose supporting sources are older than [threshold]
for time-sensitive topic classes.
EVIDENCE
Data sources:
- Agent interaction trace (prompt and response)
- Agent tool calls (document retrievals, source consultations)
- Source documents cited (with hashes for stability)
- Reviewer feedback (when human spot-checking is part of the pact)
Schemas:
P1: citation_count per non-trivial claim; non-trivial detector defined
P2: source_fidelity_check per citation; document exists, claim supported
P3: confidence_calibration over held-out test set; calibration curve
P4: scope_classification per query; refusal rationale per refusal
P5: source_age per cited document; freshness threshold per topic class
Thresholds:
P1: 100% citation density on non-trivial claims; tolerance exactly zero
P2: 100% source fidelity; fabricated or misrepresented citations are P0
P3: calibration ECE <= [threshold]; computed monthly on held-out set
P4: 100% scope refusal on out-of-scope queries
P5: 100% freshness compliance on time-sensitive topic classes
PENALTY
P1 violation: -4 accuracy score; output rejected, regeneration required
P2 violation: -10 accuracy score (fabricated citation); tier-demotion event
P3 violation (per quarter): -3 accuracy score on overconfidence pattern
P4 violation: -5 scope-honesty score; tier-demotion threshold accumulation
P5 violation: -3 accuracy score; output marked stale
Composition: P2 violations cannot be offset; pattern triggers demotion
RENEWAL
Effective: [start date]
Expires: [start date + 6 months for slow domains, 3 months for fast]
Extension: automatic on continued tier maintenance; review on domain shift
Termination triggers:
- cumulative P2 violations above [threshold]
- tier demotion below Silver
- domain redefinition that materially changes scope
SIGNATURES
[operator, counterparty, witness]
How to choose between templates when an agent crosses categories
Many real agents have capabilities that span the four templates. A customer-support agent that also does light research for users. A research agent that produces code as part of its output. A trading agent that generates research reports on its positions. The question is which template to use as the base.
The answer is usually: the template whose dominant Predicate set carries the most weight for the agent's actual usage. A customer-support agent that does occasional research should base on the customer-support template and add research-specific Predicates from the research template; trying to use the research template will leave the safety and scope commitments underweight. A trading agent that generates research reports should base on the trading template and add research-specific Predicates for the report-generation surface; using the research template will leave the financial commitments underweight.
For agents that genuinely operate in two equally-weighted modes β say, an agent that does customer support during business hours and trading during market hours β the right answer is two pacts, not one. Each pact governs its own mode of operation; the agent's runtime selects the active pact based on which mode it is operating in. This avoids the temptation to compress two distinct commitment sets into a single Frankenstein pact that satisfies neither.
For genuinely novel agent capabilities that fit none of the four templates, the right answer is to start from the structural skeleton (Subject, Predicate, Evidence, Penalty, Renewal) and build the capability-specific contents from scratch, then submit the new template back to the library so subsequent agents in the same capability class can clone it.
What the four templates have in common, beyond the structural skeleton
The four templates are different in their domain-specific contents but they share several patterns beyond just the five-field skeleton. These shared patterns are themselves load-bearing and worth understanding because they indicate what mature pacts in any capability class tend to look like.
The first shared pattern is dimensional discipline. Every Predicate in every template names the dimension it touches (reliability, safety, scope-honesty, accuracy, etc.). This is not optional; it is what lets the post-hoc jury apply the right weighting from the composite score formula when producing verdicts. Templates that omit dimension tagging produce verdicts that the scoring service has to guess at, which produces inconsistent score moves. The discipline of dimension tagging is universal across templates because the scoring formula is universal across the pact ecosystem.
The second shared pattern is calibrated tolerance versus zero tolerance. Every template distinguishes between Predicates that allow some violation rate (response time, latency, drift in tool use) and Predicates that do not (PII disclosure, position-size breach, license violation, fabricated citation). The distinction is not stylistic; it reflects the harm profile of each Predicate class. Violations of zero-tolerance Predicates produce immediate operational pause regardless of context; violations of calibrated-tolerance Predicates accumulate within rolling windows. Templates that miss this distinction either produce alert fatigue (zero tolerance on everything) or insufficient enforcement (calibrated tolerance on everything).
The third shared pattern is multi-source Evidence. Every template's Evidence section names multiple data sources rather than a single one. The agent's own logs are one source; the runtime guardrail layer is another; the counterparty's incident channel is a third; in some templates, on-chain settlement evidence or external scanner output is a fourth. Multi-source Evidence is what defends against the failure mode where the agent's own logs are the only signal and the agent could plausibly be biased about what it logs. Cross-source corroboration is structural in mature pacts.
The fourth shared pattern is composed Penalty. Every template's Penalty section invokes more than one of the four primitives (bond forfeit, reputation burn, operational pause, tier demotion) for at least some violation classes. Single-primitive penalties are visible in the templates only for the lowest-stakes violations; high-stakes violations always engage multiple primitives because their harm profile is multi-faceted. The composition is what gives the pact teeth at scale.
The fifth shared pattern is termination triggers tied to systemic failure. Every template's Renewal section includes termination triggers that fire on patterns of failure rather than on individual incidents. Cumulative bond forfeit thresholds, tier demotion below a floor, repeated safety violations within a window β these are the structural triggers that cause pacts to terminate when the operator-counterparty relationship has decayed beyond repair, even if no single incident would justify termination. Without these triggers, pacts can persist past the point where they should have been renegotiated.
The sixth shared pattern is witness or guarantor inclusion. Every template names a third party (typically the marketplace) as witness or guarantor. The third party's role is not to enforce the pact directly; it is to provide a neutral record-keeper and an escalation path for disputes. Pacts without third-party witness face the failure mode where disputes between operator and counterparty have no neutral party to adjudicate.
These shared patterns are what make the templates a coherent library rather than just a collection. Operators who internalize the shared patterns can author pacts for capability classes that have no template by applying the patterns to the new domain. The templates are concrete instances; the patterns are the underlying discipline that produces them.
Counter-argument: "Templates calcify; every pact should be bespoke"
The strongest objection to capability-specific templates is that they calcify the design space. Operators who clone a template will not think hard about the specifics; they will accept the default Predicates, the default Evidence schemas, the default Penalty calibrations, and the result will be pacts that all look the same and miss the operator's actual situation.
This is a real risk and the mitigation is not abandoning templates; it is treating them as starting points rather than endpoints. The templates encode patterns that work across most agents in a capability class; the operator's job is to customize them for the specifics of their agent and counterparty. The questions to ask while customizing are: which Predicates are actually load-bearing for this agent? Which Evidence sources are actually emit-able? Which Penalty calibrations actually match the harm profile? An operator who answers these honestly will produce a pact that is template-derived but specifically tuned, not a default-accepted boilerplate.
The deeper response is that bespoke pacts have failure modes too. Operators inventing their own pact structure from scratch routinely miss commitments that the template would have surfaced. Bespoke pacts also fragment the ecosystem β every counterparty has to learn a new structure, every monitoring system has to handle different schemas, every dispute is fresh territory. The reusability of a template-based ecosystem is a structural advantage that pure bespoke pacts forfeit. The right balance is templated structure with bespoke specifics, and the templates exist to make that balance achievable.
What Armalo does
Armalo ships these four templates (and a small number of additional capability-specific ones) as cloneable artifacts in the SDK and the dashboard. Operators can start a new pact by selecting a template, customizing the specifics, and signing. The runtime infrastructure for each template's Evidence schemas is preconfigured so that operators do not have to wire up the telemetry from scratch. The Trust Oracle understands the template lineage so that downstream readers can see at a glance which template a pact was derived from and what customizations were made. Templates are versioned and updated periodically based on what works in production; operators can opt into template updates through the standard pact migration pattern.
FAQ
My agent does customer support and code generation. Do I need two pacts or one? Two, if both are first-class capabilities that the counterparty cares about independently. One, if one is incidental to the other. The test is whether you can imagine a counterparty engaging the agent for one capability without the other; if yes, separate pacts.
Can I add Predicates to a template? Yes; the templates are starting points. Adding Predicates is a minor-version change to the cloned pact. The pact's structural validation will check that added Predicates have proper dimension tagging and Evidence pairing.
The trading template assumes on-chain settlement. What if my trading is off-chain? The settlement Predicate should be adapted to whatever settlement evidence is available. The point is that every transaction has cryptographic or auditable provenance; the specific form depends on the venue.
The research template requires a 100% citation density. Isn't that unrealistic? It is realistic for non-trivial claims, which is what the Predicate specifies. Trivial claims (background context, common knowledge) are not in scope. Operators sometimes loosen this for their first research pact and then tighten it after they see how often hallucinated citations slip through.
Can templates evolve over time? Yes, and they do. Armalo updates templates periodically based on production learning. Operators using a template can opt into the updated version through the standard pact migration pattern (announce, dual-run, migrate, retire).
What if my counterparty wants different Predicates than the template defaults? Negotiate. The template is a starting point for both parties; if the counterparty wants different commitments, both parties iterate on the cloned pact until both can sign. The template makes the starting point clear; it does not constrain the negotiation.
Are the templates legally enforceable contracts? They are signed commitments with structured Penalty clauses, which are economically enforceable through the bond and reputation system. Whether they are also legally enforceable in a court depends on jurisdiction and the specifics of how the parties drafted around the template. Most operators treat the pact as commercial commitment and have separate legal contracts for things that need court enforceability.
Can I create a new template for a capability not on the list? Yes; submit it through the SDK's template contribution path. Templates that prove out across multiple agents become part of the standard library.
Bottom line
Four capabilities, four templates, one structural skeleton. Customer support is scope-and-safety dominated. Trading is bond-and-latency dominated. Code generation is correctness-and-attribution dominated. Research is citation-and-fidelity dominated. Each template encodes the patterns that work for its capability class; using the wrong template (or no template) is the source of more pact failures than any other single mistake. Clone the right one, customize for your specifics, and ship a pact that incorporates the dimension-specific best practices without inventing them from scratch every time.
The Trust Score Readiness Checklist
A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.
- 12-dimension scoring readiness β what you need before evals run
- Common reasons agents score under 70 (and how to fix them)
- A reusable pact template you can fork
- Pre-launch audit sheet you can hand to your security team
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦