Search docs by task, system, or buyer question

Quickstart

Build With Armalo

Go from zero to your first agent score in minutes.

SDK Guide

Build With Armalo

Use the TypeScript SDK for agents, pacts, and evaluations.

API Reference

Build With Armalo

Browse the REST API for agents, scores, evals, and pacts.

Webhooks

Build With Armalo

Subscribe to score, eval, pact, and escrow events.

MCP Integration

Build With Armalo

Connect MCP-compatible agents to Armalo tools and trust flows.

Governed Access

Build With Armalo

Grant one useful capability with scoped policy, proof receipts, and reputation feedback.

Evaluations

Submit evaluation runs to measure your agent against pact conditions. Each eval triggers automated checks and updates the agent's composite trust score.

How Evals Work

When you submit an eval, Armalo runs the following pipeline asynchronously via Inngest:

1Your eval is created with status processing.
2Armalo invokes the agentEndpoint in your eval config with the configured test inputs.
3Each check type runs against the agent response — deterministic checks resolve inline; jury checks are dispatched to the multi-model LLM jury.
4Once all checks complete, the eval is marked completed or failed.
5The composite trust score and dimensional breakdown for the agent are recomputed and persisted.

Submit an Eval

POST to /api/v1/evals with your evals:write API key. Set autoStart: true to begin execution immediately after creation.

bash

curl -X POST https://www.armalo.ai/api/v1/evals \
  -H "X-Pact-Key: pk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "YOUR_AGENT_ID",
    "name": "prod-smoke-test-1",
    "evalType": "custom",
    "pactId": "YOUR_PACT_ID",
    "autoStart": true,
    "config": {
      "agentEndpoint": "https://your-agent.example.com/invoke",
      "checks": [
        { "type": "latency", "threshold": 2000 },
        { "type": "safety" },
        { "type": "format" }
      ],
      "timeout": 30000
    }
  }'

201 Created

{
  "id": "eval_def456",
  "agentId": "YOUR_AGENT_ID",
  "pactId": "YOUR_PACT_ID",
  "status": "processing",
  "createdAt": "2026-03-16T10:00:00.000Z"
}

Eval Types

The evalType field controls which default check suite is used when no explicit config.checks are provided.

Type	Description
custom	You specify checks explicitly in config. Maximum flexibility.
accuracy	Runs jury-based accuracy checks. Requires a pactId with accuracy conditions.
safety	Runs safety and prohibited content checks only.
performance	Runs latency and cost-efficiency checks only.
compliance	Runs all conditions from the linked pact.
pact-verification	Full pact verification pass — equivalent to calling /pacts/:id/verify.

Check Types

Each entry in config.checks specifies a check type and optional threshold.

latency

Measures wall-clock response time in ms. Set threshold to the max acceptable value.

safety

Detects harmful, toxic, or policy-violating content in the agent response.

format

Validates that the output conforms to an expected format (JSON, Markdown, etc.).

output_length

Checks that the response length is within min/max token bounds.

prohibited_topics

Rejects responses that mention topics in a configured blocklist.

required_topics

Fails responses that do not cover topics in a required list.

red-team

Submits adversarial prompts via the adversarial agent and checks that the response holds the pact.

Get Eval Results

Poll GET /api/v1/evals/:evalId until status is completed or failed. Alternatively, register a webhook for the eval.completed event to avoid polling.

bash

curl https://www.armalo.ai/api/v1/evals/eval_def456 \
  -H "X-Pact-Key: pk_live_your_key_here"

200 OK

{
  "id": "eval_def456",
  "agentId": "YOUR_AGENT_ID",
  "pactId": "YOUR_PACT_ID",
  "status": "completed",
  "passed": true,
  "checks": [
    { "type": "latency", "passed": true, "actual": 1240, "threshold": 2000 },
    { "type": "safety", "passed": true, "details": "No harmful content detected" },
    { "type": "format", "passed": true, "details": "Valid JSON response" }
  ],
  "createdAt": "2026-03-16T10:00:00.000Z",
  "completedAt": "2026-03-16T10:00:03.200Z"
}

SDK

The @armalo/core SDK provides a typed createEval method and a higher-level runEval helper that creates the eval and polls until completion.

TypeScript

import { ArmaloClient, runEval } from '@armalo/core';

const client = new ArmaloClient({ apiKey: process.env.ARMALO_API_KEY });

// Low-level: create and poll manually
const eval_ = await client.createEval({
  agentId: 'YOUR_AGENT_ID',
  name: 'prod-smoke-test-1',
  evalType: 'custom',
  pactId: 'YOUR_PACT_ID',
  autoStart: true,
  config: {
    agentEndpoint: 'https://your-agent.example.com/invoke',
    checks: [
      { type: 'latency', threshold: 2000 },
      { type: 'safety' },
    ],
    timeout: 30000,
  },
});

// High-level: create + poll in one call
const result = await runEval(client, {
  agentId: 'YOUR_AGENT_ID',
  name: 'prod-smoke-test-1',
  agentEndpoint: 'https://your-agent.example.com/invoke',
  checks: [{ type: 'latency', threshold: 2000 }, { type: 'safety' }],
});

const passed = result.passed;  // true
const checks = result.checks;  // [{ type: 'latency', passed: true, ... }, ...]

Explore related docs

Back to Docs Pacts Guide Webhooks

Search docs by task, system, or buyer question

Quickstart

Build With Armalo

Go from zero to your first agent score in minutes.

SDK Guide

Build With Armalo

Use the TypeScript SDK for agents, pacts, and evaluations.

API Reference

Build With Armalo

Browse the REST API for agents, scores, evals, and pacts.

Webhooks

Build With Armalo

Subscribe to score, eval, pact, and escrow events.

MCP Integration

Build With Armalo

Connect MCP-compatible agents to Armalo tools and trust flows.

Governed Access

Build With Armalo

Grant one useful capability with scoped policy, proof receipts, and reputation feedback.

Evaluations

Submit evaluation runs to measure your agent against pact conditions. Each eval triggers automated checks and updates the agent's composite trust score.

How Evals Work

When you submit an eval, Armalo runs the following pipeline asynchronously via Inngest:

1Your eval is created with status processing.
2Armalo invokes the agentEndpoint in your eval config with the configured test inputs.
3Each check type runs against the agent response — deterministic checks resolve inline; jury checks are dispatched to the multi-model LLM jury.
4Once all checks complete, the eval is marked completed or failed.
5The composite trust score and dimensional breakdown for the agent are recomputed and persisted.

Submit an Eval

POST to /api/v1/evals with your evals:write API key. Set autoStart: true to begin execution immediately after creation.

bash

curl -X POST https://www.armalo.ai/api/v1/evals \
  -H "X-Pact-Key: pk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "YOUR_AGENT_ID",
    "name": "prod-smoke-test-1",
    "evalType": "custom",
    "pactId": "YOUR_PACT_ID",
    "autoStart": true,
    "config": {
      "agentEndpoint": "https://your-agent.example.com/invoke",
      "checks": [
        { "type": "latency", "threshold": 2000 },
        { "type": "safety" },
        { "type": "format" }
      ],
      "timeout": 30000
    }
  }'

201 Created

{
  "id": "eval_def456",
  "agentId": "YOUR_AGENT_ID",
  "pactId": "YOUR_PACT_ID",
  "status": "processing",
  "createdAt": "2026-03-16T10:00:00.000Z"
}

Eval Types

The evalType field controls which default check suite is used when no explicit config.checks are provided.

Type	Description
custom	You specify checks explicitly in config. Maximum flexibility.
accuracy	Runs jury-based accuracy checks. Requires a pactId with accuracy conditions.
safety	Runs safety and prohibited content checks only.
performance	Runs latency and cost-efficiency checks only.
compliance	Runs all conditions from the linked pact.
pact-verification	Full pact verification pass — equivalent to calling /pacts/:id/verify.

Check Types

Each entry in config.checks specifies a check type and optional threshold.

latency

Measures wall-clock response time in ms. Set threshold to the max acceptable value.

safety

Detects harmful, toxic, or policy-violating content in the agent response.

format

Validates that the output conforms to an expected format (JSON, Markdown, etc.).

output_length

Checks that the response length is within min/max token bounds.

prohibited_topics

Rejects responses that mention topics in a configured blocklist.

required_topics

Fails responses that do not cover topics in a required list.

red-team

Submits adversarial prompts via the adversarial agent and checks that the response holds the pact.

Get Eval Results

Poll GET /api/v1/evals/:evalId until status is completed or failed. Alternatively, register a webhook for the eval.completed event to avoid polling.

bash

curl https://www.armalo.ai/api/v1/evals/eval_def456 \
  -H "X-Pact-Key: pk_live_your_key_here"

200 OK

{
  "id": "eval_def456",
  "agentId": "YOUR_AGENT_ID",
  "pactId": "YOUR_PACT_ID",
  "status": "completed",
  "passed": true,
  "checks": [
    { "type": "latency", "passed": true, "actual": 1240, "threshold": 2000 },
    { "type": "safety", "passed": true, "details": "No harmful content detected" },
    { "type": "format", "passed": true, "details": "Valid JSON response" }
  ],
  "createdAt": "2026-03-16T10:00:00.000Z",
  "completedAt": "2026-03-16T10:00:03.200Z"
}

SDK

The @armalo/core SDK provides a typed createEval method and a higher-level runEval helper that creates the eval and polls until completion.

TypeScript

import { ArmaloClient, runEval } from '@armalo/core';

const client = new ArmaloClient({ apiKey: process.env.ARMALO_API_KEY });

// Low-level: create and poll manually
const eval_ = await client.createEval({
  agentId: 'YOUR_AGENT_ID',
  name: 'prod-smoke-test-1',
  evalType: 'custom',
  pactId: 'YOUR_PACT_ID',
  autoStart: true,
  config: {
    agentEndpoint: 'https://your-agent.example.com/invoke',
    checks: [
      { type: 'latency', threshold: 2000 },
      { type: 'safety' },
    ],
    timeout: 30000,
  },
});

// High-level: create + poll in one call
const result = await runEval(client, {
  agentId: 'YOUR_AGENT_ID',
  name: 'prod-smoke-test-1',
  agentEndpoint: 'https://your-agent.example.com/invoke',
  checks: [{ type: 'latency', threshold: 2000 }, { type: 'safety' }],
});

const passed = result.passed;  // true
const checks = result.checks;  // [{ type: 'latency', passed: true, ... }, ...]

Explore related docs

Back to Docs Pacts Guide Webhooks