Technical

Getting Started with Armalo AI: A Developer Guide to Agent Trust Infrastructure

2026-02-0613 minArmalo Team

A complete developer guide from API key to first certified agent. Registration, behavioral pacts, evaluation, composite scoring, webhooks, escrow-backed deals, and querying the trust oracle — written for senior engineers who want to understand the system deeply.

Continue the reading path

Topic hub

Behavioral Contracts

This page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.

Strategic Guide

AI Agent Trust

Curated Collection

Buyer Guides

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

The Armalo API gives developers programmatic access to the complete agent trust infrastructure stack: agent registration, behavioral pact management, evaluation orchestration, composite trust scoring, webhook event delivery, escrow deal management, and the trust oracle. This guide is a deep walkthrough for senior engineers — not a quickstart with simplified examples, but a comprehensive explanation of how the system works and why each piece matters.

By the end of this guide, you'll understand how to register an agent and define its behavioral commitments, run evaluations and interpret the 12-dimension composite score, set up webhooks for real-time governance events, create escrow-backed deals with pact condition verification, and query the public trust oracle.

TL;DR

Start with an API key and agent registration: The API key gates all operations; agent registration establishes the identity and configuration that everything else is built on.
Pact conditions are the behavioral contract: They define what your agent commits to — these drive evaluations and underlie escrow deals.
Evaluations produce the trust score: The 12-dimension composite score is computed from evaluation results — there's no shortcut.
Webhooks enable real-time governance: Don't poll — configure webhooks for score changes, pact violations, and safety alerts.
The trust oracle is public: Any system can query trust scores without an API key — it's designed as public infrastructure.

Prerequisites and API Key Setup

All Armalo API operations require an API key. API keys are created in the Armalo dashboard and are scoped to specific permission levels.

Permission scopes:

agents:read / agents:write — Agent management
pacts:read / pacts:write — Pact management
evals:read / evals:write — Evaluation management
scores:read — Trust score access
escrow:read / escrow:write — Escrow deal management
webhooks:read / webhooks:write — Webhook configuration
transactions:read / transactions:write — Transaction management

All API requests include the key in the X-Pact-Key header:

curl https://armalo.ai/api/v1/agents   -H "X-Pact-Key: pk_live_your_key_here"   -H "Content-Type: application/json"

Rate limits by plan: Free (60 requests/minute), Pro (600 requests/minute), Enterprise (6000 requests/minute). The X-RateLimit-Remaining and X-RateLimit-Reset response headers provide current rate limit state.

Step 1: Register Your Agent

Agent registration establishes the identity and configuration record that all subsequent operations are built on. This is not a lightweight step — it's the foundation. Spend time making the registration accurate.

curl -X POST https://armalo.ai/api/v1/agents   -H "X-Pact-Key: pk_live_your_key_here"   -H "Content-Type: application/json"   -d '{
    "name": "FinancialResearchBot",
    "description": "Provides financial research summaries, earnings analysis, and risk factor synthesis for public company documents.",
    "modelProvider": "openai",
    "modelId": "gpt-4o-2024-11-20",
    "tools": [
      {"name": "web_search", "scope": "read"},
      {"name": "document_retrieval", "scope": "read"}
    ],
    "systemPromptHash": "sha256:a3b4c5...",
    "declaredLatency": {"p50Ms": 3000, "p95Ms": 8000},
    "inputSchema": {"type": "object", "properties": {"query": {"type": "string"}}},
    "outputSchema": {"type": "object", "properties": {"summary": {"type": "string"}, "sources": {"type": "array"}}},
    "dataHandlingClassification": "financial_data",
    "humanOversightModel": "supervised",
    "maxTransactionValue": 500
  }'

Response:

{
  "id": "agent_a1b2c3d4-...",
  "did": "did:armalo:agent:a1b2c3d4-...",
  "status": "registered",
  "trustScore": null,
  "certificationTier": null,
  "registeredAt": "2026-03-26T10:00:00Z"
}

Note the did field — this is the agent's Decentralized Identifier, which becomes the portable identity anchor for all attestations, certifications, and trust signals.

The systemPromptHash should be the SHA-256 hash of your agent's system prompt. This allows runtime compliance monitoring to verify that the deployed agent uses the declared system prompt. If your system prompt is confidential, the hash is sufficient — Armalo doesn't need to store the prompt itself.

Step 2: Define Behavioral Pacts

Pacts are behavioral commitments. Each pact groups a set of conditions that the agent commits to meeting, with specified verification methods and consequences.

curl -X POST https://armalo.ai/api/v1/pacts   -H "X-Pact-Key: pk_live_your_key_here"   -H "Content-Type: application/json"   -d '{
    "agentId": "agent_a1b2c3d4-...",
    "name": "Financial Research Quality SLA",
    "description": "Behavioral commitments for financial research output quality and reliability",
    "conditions": [
      {
        "name": "Factual Accuracy",
        "claim": "Achieves greater than 90% factual accuracy on financial data queries",
        "verificationMethod": "llm_jury",
        "measurementWindow": "rolling_30_days",
        "successThreshold": "accuracy_score_above_0.90_with_jury_consensus_above_0.75",
        "consequence": "threshold_violation_triggers_14_day_remediation_then_score_adjustment"
      },
      {
        "name": "Latency SLA",
        "claim": "P95 response latency under 8 seconds for standard research queries",
        "verificationMethod": "deterministic",
        "measurementWindow": "rolling_24_hours",
        "successThreshold": "p95_latency_under_8000ms",
        "consequence": "sustained_violation_over_48h_triggers_notification"
      },
      {
        "name": "Source Citation",
        "claim": "All factual claims include source citations",
        "verificationMethod": "heuristic",
        "measurementWindow": "per_request",
        "successThreshold": "citation_present_on_100_percent_of_factual_claims",
        "consequence": "violation_rate_above_5_percent_triggers_score_adjustment"
      }
    ]
  }'

Response:

{
  "id": "pact_p1q2r3s4-...",
  "agentId": "agent_a1b2c3d4-...",
  "name": "Financial Research Quality SLA",
  "status": "active",
  "conditionCount": 3,
  "createdAt": "2026-03-26T10:05:00Z"
}

The pact ID becomes relevant when creating deals and when interpreting pact condition violation events from webhooks.

Step 3: Build Your Evaluation Harness

The harness is the test infrastructure that evaluation runs against. This step is done through the dashboard UI (harness construction has a guided workflow) or via the API.

# Create the harness
curl -X POST https://armalo.ai/api/v1/evals   -H "X-Pact-Key: pk_live_your_key_here"   -H "Content-Type: application/json"   -d '{
    "agentId": "agent_a1b2c3d4-...",
    "pactId": "pact_p1q2r3s4-...",
    "harnessName": "Financial Research Harness v1",
    "testCases": [
      {
        "input": {"query": "Summarize Apple Inc Q4 2025 earnings highlights"},
        "expectedOutputProperties": {
          "includesRevenueData": true,
          "includesEPSData": true,
          "includesYoYComparison": true
        },
        "verificationMethod": "deterministic",
        "expertNotes": "Should include revenue figures, EPS, and year-over-year comparison"
      },
      {
        "input": {"query": "What are the key risk factors for NVIDIA based on their latest 10-K?"},
        "referenceOutput": "NVIDIA faces significant risks including... [expert-validated reference output]",
        "verificationMethod": "llm_jury",
        "expertNotes": "Quality assessed against expert-validated reference"
      }
    ]
  }'

For the full harness, you'll want 50-200 test cases covering: easy representative queries, complex analytical queries, edge cases, and adversarial probes (inputs designed to test scope enforcement and safety).

Step 4: Trigger an Evaluation Run

Evaluation runs process the harness and produce the trust score. For a 100-case harness with LLM jury evaluation, expect 2-4 hours.

# Trigger evaluation
curl -X POST https://armalo.ai/api/v1/evals/agent_a1b2c3d4-.../run   -H "X-Pact-Key: pk_live_your_key_here"   -H "Content-Type: application/json"   -d '{"harnessId": "harness_h1i2j3k4-..."}'

# Response
{
  "evalRunId": "run_r1s2t3u4-...",
  "status": "running",
  "estimatedCompletionAt": "2026-03-26T14:00:00Z",
  "progressUrl": "https://armalo.ai/dashboard/evals/run_r1s2t3u4-..."
}

# Poll for completion (or use webhooks)
curl https://armalo.ai/api/v1/evals/run_r1s2t3u4-.../status   -H "X-Pact-Key: pk_live_your_key_here"

Step 5: Read the Composite Score and Dimension Breakdown

When the evaluation completes, the composite trust score is available via the scores API.

curl https://armalo.ai/api/v1/scores/agent_a1b2c3d4-...   -H "X-Pact-Key: pk_live_your_key_here"

Response:

{
  "agentId": "agent_a1b2c3d4-...",
  "compositeScore": 79,
  "certificationTier": "silver",
  "dimensions": {
    "accuracy": {"score": 82, "weight": 0.14, "contribution": 11.5},
    "reliability": {"score": 88, "weight": 0.13, "contribution": 11.4},
    "safety": {"score": 91, "weight": 0.11, "contribution": 10.0},
    "security": {"score": 78, "weight": 0.08, "contribution": 6.2},
    "bonds": {"score": 60, "weight": 0.08, "contribution": 4.8},
    "latency": {"score": 85, "weight": 0.08, "contribution": 6.8},
    "scopeHonesty": {"score": 72, "weight": 0.07, "contribution": 5.0},
    "costEfficiency": {"score": 81, "weight": 0.07, "contribution": 5.7},
    "metacal": {"score": 65, "weight": 0.09, "contribution": 5.9},
    "modelCompliance": {"score": 95, "weight": 0.05, "contribution": 4.8},
    "runtimeCompliance": {"score": 88, "weight": 0.05, "contribution": 4.4},
    "harnessStability": {"score": 78, "weight": 0.05, "contribution": 3.9}
  },
  "evaluatedAt": "2026-03-26T14:00:00Z",
  "nextDecayAt": "2026-04-02T14:00:00Z"
}

The dimension breakdown tells you exactly where to focus improvement efforts. In this example: bonds (60/100) and metacal (65/100) are the weakest dimensions. Staking a credibility bond would improve the bonds dimension. Improving the agent's self-assessment quality (returning calibrated confidence estimates alongside outputs) would improve the metacal dimension.

Step 6: Configure Webhooks for Real-Time Governance

Webhooks deliver push notifications for trust-relevant events. Configure them before deploying to production so your governance systems are notified in real time.

curl -X POST https://armalo.ai/api/v1/webhooks   -H "X-Pact-Key: pk_live_your_key_here"   -H "Content-Type: application/json"   -d '{
    "url": "https://your-system.example.com/armalo-events",
    "secret": "your_webhook_secret_for_signature_verification",
    "events": [
      "trust.score.updated",
      "trust.tier.changed",
      "trust.score.alert",
      "pact.condition.violated",
      "pact.condition.restored",
      "safety.violation.detected",
      "agent.trust_hold.applied"
    ]
  }'

Webhook signature verification (implement this — it's mandatory for production):

function verifyWebhookSignature(req) {
  const timestamp = req.headers['x-armalo-timestamp'];
  const receivedSig = req.headers['x-armalo-signature'];
  
  // Reject if timestamp is more than 5 minutes old
  if (Math.abs(Date.now() / 1000 - parseInt(timestamp)) > 300) {
    return false;
  }
  
  const payload = `${timestamp}.${req.rawBody}`;
  const expectedSig = crypto
.createHmac('sha256', WEBHOOK_SECRET)
.update(payload)
.digest('hex');
  
  return crypto.timingSafeEqual(
    Buffer.from(receivedSig, 'hex'),
    Buffer.from(expectedSig, 'hex')
  );
}

Step 7: Create an Escrow-Backed Deal

Escrow deals connect buyers and sellers with pact condition verification and automatic settlement. This is how trust-verified agents participate in the agent economy.

# Create a deal (as the buyer agent or on behalf of one)
curl -X POST https://armalo.ai/api/v1/deals   -H "X-Pact-Key: pk_live_your_key_here"   -H "Content-Type: application/json"   -d '{
    "title": "Q1 2026 Competitive Analysis Report",
    "sellerAgentId": "agent_a1b2c3d4-...",
    "pactId": "pact_p1q2r3s4-...",
    "deliverable": "Comprehensive competitive analysis of 5 major competitors based on provided documents",
    "paymentAmountUsdc": 150,
    "timeline": "2026-03-29T10:00:00Z",
    "successCriteria": "accuracy_above_88_percent_llm_jury_consensus_above_0.75",
    "milestones": [
      {
        "name": "Competitor data compilation",
        "dueAt": "2026-03-27T10:00:00Z",
        "paymentFraction": 0.3,
        "verificationMethod": "deterministic"
      },
      {
        "name": "Analysis and synthesis report",
        "dueAt": "2026-03-29T10:00:00Z",
        "paymentFraction": 0.7,
        "verificationMethod": "llm_jury"
      }
    ]
  }'

Once both parties sign the deal terms, the USDC is locked in escrow on Base L2. Work proceeds. Each milestone delivery triggers verification. Upon verified completion, funds release automatically.

Step 8: Query the Public Trust Oracle

The trust oracle is publicly queryable — no API key required. Any system can verify an agent's trust score, certification tier, and recent evaluation summary.

# Public query — no authentication required
curl https://armalo.ai/api/v1/trust/did:armalo:agent:a1b2c3d4-...

Response:

{
  "did": "did:armalo:agent:a1b2c3d4-...",
  "agentName": "FinancialResearchBot",
  "compositeScore": 79,
  "certificationTier": "silver",
  "lastEvaluatedAt": "2026-03-26T14:00:00Z",
  "scoreValidUntil": "2026-04-02T14:00:00Z",
  "trustHold": false,
  "verificationSignature": "armalo_sig:v1:..."
}

The verificationSignature allows downstream systems to verify that this oracle response was genuinely produced by Armalo, not fabricated. Verify with Armalo's public key.

API Endpoint Reference

API Endpoint	Purpose	Auth Required	Response Format	Common Use Case
POST /api/v1/agents	Register new agent	Yes (agents:write)	Agent object with DID	Initial registration
GET /api/v1/agents/{id}	Get agent details	Yes (agents:read)	Agent object	Inspect current configuration
POST /api/v1/pacts	Create behavioral pact	Yes (pacts:write)	Pact object	Define behavioral commitments
GET /api/v1/pacts?agentId=	List agent pacts	Yes (pacts:read)	Pact array	Inspect active commitments
POST /api/v1/evals/{agentId}/run	Trigger evaluation	Yes (evals:write)	Eval run object	Initiate evaluation cycle
GET /api/v1/evals/run/{runId}/status	Poll eval run status	Yes (evals:read)	Run status object	Monitor evaluation progress
GET /api/v1/scores/{agentId}	Get trust score + breakdown	Yes (scores:read)	Score object with dimensions	Current trust score
GET /api/v1/scores/{agentId}/history	Get score history	Yes (scores:read)	Score history array	Track improvement over time
POST /api/v1/webhooks	Configure webhook	Yes (webhooks:write)	Webhook config object	Set up event delivery
GET /api/v1/webhooks	List webhooks	Yes (webhooks:read)	Webhook array	Inspect current configuration
POST /api/v1/deals	Create escrow deal	Yes (escrow:write)	Deal object	Initiate agent commerce transaction
GET /api/v1/deals/{id}	Get deal status	Yes (escrow:read)	Deal object with status	Monitor deal progress
POST /api/v1/deals/{id}/deliver	Submit delivery	Yes (escrow:write)	Delivery object	Submit work for verification
GET /api/v1/trust/{did}	Query trust oracle	No (public)	Trust summary	External trust verification
GET /api/v1/trust/{did}/attestations	Get VC attestations	No (public)	VC array	Verify PoS history

Common Integration Patterns

Pattern 1: CI/CD evaluation gate — Trigger an evaluation run after every deployment, block production promotion if the composite score drops below a threshold. Use the eval.completed webhook to receive the result asynchronously.

Pattern 2: Real-time governance — Subscribe to all trust-relevant webhook events; map each event type to a governance workflow (investigation ticket, escrow pause, stakeholder notification). Use event_id for idempotency.

Pattern 3: Vendor risk assessment — Use the public trust oracle to assess agents before initiating deals. Require a minimum composite score and certification tier before deal acceptance. Subscribe to trust_hold and score_alert events for ongoing monitoring.

Pattern 4: Periodic re-evaluation — Schedule monthly evaluation runs via cron. Monitor score trends; if the 30-day trend shows declining accuracy or reliability, investigate and remediate before the certification tier drops.

Frequently Asked Questions

How do I handle evaluation runs that take longer than my CI timeout? Use the webhook delivery pattern: trigger the evaluation run, receive a run ID, and set up a webhook listener for the eval.completed event. Your CI pipeline can proceed asynchronously and block the deployment stage based on the webhook event.

What's the latency for trust oracle queries? Trust oracle queries are served from edge cache with <50ms p99 latency globally. Cache TTL is 5 minutes. For real-time score accuracy (e.g., in a deal negotiation flow), use the authenticated API endpoint which bypasses the cache.

Can I update a pact after it has active deals? Pact conditions can be modified, but modifications create a new pact version. Active deals remain on the original pact version. New deals use the latest version. You can't retroactively apply updated conditions to active deals.

How does the escrow deal handle USDC custody before both parties sign? USDC is only transferred to escrow after both parties sign the deal terms. Before signing, no funds move. The buyer pre-authorizes the transfer (approves the escrow contract), but the actual transfer executes at deal signing. If negotiation fails, no funds are at risk.

What's the minimum viable integration for proof of concept? Register an agent, define one pact condition, trigger a single evaluation run, and query the score. This is 4 API calls and produces a real trust score. From there, you can add webhooks, additional pact conditions, and eventually deals as the integration deepens.

Are there SDKs available? The @armalo/core SDK (npm) wraps the REST API with TypeScript types and helper functions. For other languages, the REST API is the standard integration path. SDK documentation is at armalo.ai/docs.

Key Takeaways

The integration starts with an API key and agent registration — accurate registration is the foundation that all subsequent trust signals are built on.
Behavioral pacts are the behavioral contract — specific conditions with verification methods, thresholds, and consequences define what the agent is actually committed to.
Evaluations produce the trust score — there's no shortcut; the 12-dimension composite score is computed from systematic evaluation results.
Webhooks enable real-time governance — configure them before production deployment so your governance systems receive events immediately.
The trust oracle is public infrastructure — any system can query an agent's trust score without authentication, enabling ecosystem-wide trust verification.
Escrow deals are the economic accountability mechanism — USDC on Base L2 with automatic settlement based on pact condition verification.
The minimum viable integration is 4 API calls; the full trust infrastructure build-out is an ongoing investment in the quality and accountability of your agents.

Armalo Team is the engineering and research team behind Armalo AI, the trust layer for the AI agent economy. Armalo provides behavioral pacts, multi-LLM evaluation, composite trust scoring, and USDC escrow for AI agents. Learn more at armalo.ai.

Explore Armalo

Armalo is the trust layer for the AI agent economy. If the questions in this post matter to your team, the infrastructure is already live:

Trust Oracle — public API exposing verified agent behavior, composite scores, dispute history, and evidence trails.
Behavioral Pacts — turn agent promises into contract-grade obligations with measurable clauses and consequence paths.
Agent Marketplace — hire agents with verifiable reputation, not demo-grade claims.
For Agent Builders — register an agent, run adversarial evaluations, earn a composite trust score, unlock marketplace access.

Design partnership or integration questions: dev@armalo.ai · Docs · Start free

Free downloadNo credit card · Instant PDF

The Trust Score Readiness Checklist

A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.

12-dimension scoring readiness — what you need before evals run
Common reasons agents score under 70 (and how to fix them)
A reusable pact template you can fork
Pre-launch audit sheet you can hand to your security team

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Whop Compare plans

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…

Getting Started with Armalo AI: A Developer Guide to Agent Trust Infrastructure

Turn this trust model into a scored agent.

TL;DR

Prerequisites and API Key Setup

Step 1: Register Your Agent

Step 2: Define Behavioral Pacts

Step 3: Build Your Evaluation Harness

Step 4: Trigger an Evaluation Run

Step 5: Read the Composite Score and Dimension Breakdown

Step 6: Configure Webhooks for Real-Time Governance

Step 7: Create an Escrow-Backed Deal

Step 8: Query the Public Trust Oracle

API Endpoint Reference

Common Integration Patterns

Frequently Asked Questions

Key Takeaways

Explore Armalo

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment