Continuous Behavioral Telemetry Is The SRE Primitive Of The Agent Economy
Distributed tracing reshaped microservice operations between 2014 and 2020. Behavioral telemetry will reshape agent operations between 2025 and 2030. This piece is the SRE-facing argument: what the agent-economy telemetry primitive is, how it maps to and diverges from OpenTelemetry, the four event shapes that comprise the primitive, the instrumentation patterns that get adoption, and the operational dashboards that turn the substrate into something an on-call engineer actually uses.
Continuous Behavioral Telemetry Is The SRE Primitive Of The Agent Economy
If you ran microservice infrastructure between 2014 and 2020 you watched a shift in how operability was understood. The legacy stack — application logs to disk, scraped metrics in a time-series database, occasional ad-hoc tcpdumps — gave way to a unified observability fabric centered on three primitives: structured logs, dimensional metrics, and distributed traces. The third was the most architecturally consequential. Logs had been around forever; metrics had been around forever; what was new was that a single user request crossing seventeen services produced a single trace with a parent-child causality graph, and the graph itself became the unit of operational reasoning. Tooling consolidated around this primitive faster than anyone expected. Within five years, OpenTelemetry was a sensible default for new microservice greenfield. Within seven years, asking "where's the trace ID" became the first move in any production incident.
The agent economy is in the equivalent moment. The legacy stack is application logs from the agent runtime, occasional metrics on inference latency, and a CSV export of tool-call history on demand. What is needed — what is in fact being built — is a unified behavioral fabric centered on a new primitive: continuous behavioral telemetry stitched into pact-bound contracts, queryable cross-org by every counterparty. The shift looks structurally identical to the 2014–2020 microservice shift, and the SRE discipline that emerges will look structurally identical to the one that emerged then. This piece walks through the parallel, names the new primitive, and proposes the operational patterns SRE teams should standardize around.
TL;DR for SRE
- The agent-economy analog of a distributed trace is the behavioral event stream — a sequence of
session_start,tool_call,response, andsession_endevents, ordered, tamper-evident, queryable cross-org. - The agent-economy analog of OpenTelemetry is
@armalo/telemetryplus the open ingest endpoint at/api/v1/telemetry/events. The endpoint is publicly documented; the SDK is non-blocking; the data model is pact-aware. - The agent-economy analog of a service-mesh sidecar is the telemetry SDK running in-process. The agent-economy analog of a collector is the trust oracle endpoint. The agent-economy analog of Jaeger or Honeycomb is the room ledger plus the verifier surface.
- The biggest operational difference is that the substrate is cross-org by design. Counterparties query agent behavior in the same way an oncall engineer queries a trace; the difference is that the counterparty is not the operator, and the substrate is engineered to support exactly that consumption pattern.
The microservices precedent, in one paragraph
A distributed trace is a sequence of spans. A span has a parent, a duration, a service, an operation name, and a flat key-value attribute bag. Spans are emitted from the running process via a small SDK (OpenTelemetry, OpenTracing before it, Zipkin before that). The SDK batches and ships spans to a collector. The collector forwards to a backend (Jaeger, Tempo, Honeycomb, Lightstep, Datadog APM). The backend stores the trace, indexes it, surfaces a UI for query and exploration. The discipline that emerged is: instrument the boundaries first (HTTP server entry, HTTP client exit, database call, message-bus call), then progressively the interesting internals. The operational outcome is: every user-visible incident has a query that returns the relevant trace, and the trace tells you which service was the bottleneck, where the error started, what the call graph looked like, and which downstream the slow span was waiting on.
This pattern took ten years to mature and now feels obvious. The pattern for agents is being lived through now.
The agent-economy primitive
A behavioral event has four canonical shapes. They are the analog of spans, with three important differences. Spans are bounded by time; events are bounded by purpose. Spans are typically nested via a parent-child relation; events are typically grouped by sessionId rather than via direct parent pointers (though Armalo's room ledger does carry parentEventId for explicit causality when relevant). Spans are usually local to one trace; events are designed to be queryable across orgs, so the schema constrains payload contents to what an external counterparty can reasonably consume.
session_start
The opening event of a coherent unit of agent work. Carries sessionId, agentId, optional pactId (binds the session's tool calls to a contract), startedAt. Conceptually parallel to a root span in a trace.
tel.sessionStart({
sessionId: 'sess-2026-05-13-treasury-sweep-0042',
agentId: 'atlas-uuid',
pactId: 'treasury-pact-uuid',
startedAt: new Date().toISOString(),
metadata: { initiator: 'cron:treasury-sweep', upstream_request_id: 'req-9bf' },
});
The session is the right granularity because most agent reasoning happens within a unit of work — a multi-step task, a conversation turn, a tool-orchestrated workflow. Below the session the events are too fine to reason about; above it the events are too coarse.
tool_call
The L4 substrate event. Every tool invocation the agent makes produces one. Carries sessionId, agentId, tool, params, outcome (success/error/denied/refused), optional latencyMs, optional errorMessage, optional pactId, attemptedAt.
tel.toolCall({
sessionId,
agentId,
tool: 'transfer_funds',
params: { destination: '0xA11A...', amount: 250, currency: 'USDC' },
outcome: 'success',
latencyMs: 142,
attemptedAt: new Date().toISOString(),
pactId: 'treasury-pact-uuid',
});
The server-side ingest evaluates the call against the binding for pactId.parameterBinding.tool === 'transfer_funds' and returns a validation verdict that the SDK surfaces back to the caller and that the room ledger records on the event payload. This is the moment the L4 substrate is doing work; everything else is infrastructure around this event.
response
A response emitted to a counterparty. Carries sessionId, agentId, input, output, outcome (success/refusal/partial/error), optional latencyMs, optional tokenCount, emittedAt. The response is the agent-to-world boundary event; it pairs naturally with a tool call (which is the world-to-agent boundary).
tel.response({
sessionId,
agentId,
input: 'Settle the May invoices for vendor ACME.',
output: 'Settled 4 invoices totaling $1,847 USDC across 3 calls. Receipt IDs: rcpt-91, rcpt-92, rcpt-93.',
outcome: 'success',
latencyMs: 1340,
tokenCount: 240,
emittedAt: new Date().toISOString(),
});
session_end
The closing event. Carries sessionId, agentId, endedAt, outcome (success/failure/aborted/timeout), optional reason. The closing event closes the unit of work and is the natural place to record overall session-level outcomes.
tel.sessionEnd({
sessionId,
agentId,
endedAt: new Date().toISOString(),
outcome: 'success',
});
The four shapes are intentionally small. The smallness is the entire point — a small primitive composes cleanly, audits well, and forces the instrumentation discipline to be uniform across heterogeneous agent runtimes (Python, Node, Rust, browser-side, embedded).
OpenTelemetry parallels, with the agent-specific deltas
For SRE teams already running OpenTelemetry, the mapping is direct:
| OpenTelemetry concept | @armalo/telemetry analog | Notes |
|---|---|---|
Tracer | Telemetry class instance | One per agent runtime |
| Root span (server entry) | sessionStart event | Bounds the unit of work |
| Child span on outbound call | toolCall event | The substrate's primary event type |
Span with code.error status | toolCall with outcome: 'error' | Severity escalates to warning at ledger |
| Span on outbound response | response event | The agent-to-world boundary |
| Span end / root span end | sessionEnd event | Closes the session |
| Attributes / tags | metadata field on every event | Constrained to JSON primitives |
SpanContext.traceId | sessionId | Stable across the session |
| Batched exporter | Built into the SDK | Default batch 25, flush 5s |
| Collector | /api/v1/telemetry/events ingest | HTTPS, signed via API key |
| Backend (Jaeger/Honeycomb) | Trust oracle + room ledger | Plus public query surface |
The deltas matter.
Delta one: pact awareness. OpenTelemetry has no notion of a contract. Spans describe what happened; they do not encode what should have happened. The agent-economy substrate adds the contract — the pact — and evaluates each event against it on ingest. The event payload carries the verdict. The substrate is therefore both an observability surface and a continuous compliance evaluator.
Delta two: cross-org consumability. OpenTelemetry data is typically scoped to one operator's backend. The agent-economy substrate is scoped to the agent across all operators, because the agent itself is the trust unit. The query surface (GET /api/v1/trust/{agentId}) is intentionally public, rate-limited, and signed for verifiable consumption by counterparties the operator never integrated with.
Delta three: tamper-evidence. OpenTelemetry assumes the operator owns the backend and trusts their own data. The agent-economy substrate assumes the operator is one of many parties consuming the data and that a compromise of the operator should not compromise the record. The ledger is signed, batched, and append-only at the substrate level rather than at the operator's discretion.
These three deltas are why @armalo/telemetry is its own SDK rather than a vendor-specific OpenTelemetry exporter. The shapes overlap; the trust properties do not.
Instrumentation discipline
Microservice teams learned over a decade to instrument boundaries first and internals second. Agent teams should standardize the same way.
Tier-one instrumentation (do this first)
- Wrap every tool function with
instrumentTool. One-line change at the boundary; emits atool_callevent automatically; the verdict from the pact evaluator is surfaced to the caller. - Open and close sessions at the unit-of-work boundary. A cron job opens at the start, closes at the end. A conversation turn opens at the first user message, closes at the final agent response.
- Attach
pactIdwhenever a pact is in scope. The pact binding only fires when the event references a pact; an unattached event still lands in the ledger but is not contract-evaluated.
Tier-two instrumentation (after the boundary)
- Emit a
responseevent at every agent-to-world boundary. The boundary is wherever the agent produces output that crosses a trust line — to a customer, to a counterparty agent, to an oversight dashboard, to an irrevocable downstream action. - Add
metadatakeys for cross-cutting context. Common keys:upstream_request_id,experiment_variant,model_version,prompt_template_id. Constrain to JSON primitives; the substrate stores them but does not interpret them, so authoring discipline matters.
Tier-three instrumentation (operational maturity)
- Synchronous pre-flight validation for high-cost calls. Use
POST /api/v1/pacts/{pactId}/validate-callbefore invoking a tool whose effect is irreversible. The substrate returns the verdict in tens of milliseconds; the agent can refuse to call if the verdict isvalid: false. - Wire the runtime wrapper to block on
criticalverdicts. Acriticalviolation should not just be logged; it should refuse the tool call. The substrate enforces consistency by recording the verdict regardless of runtime behavior, but the operator's wrapper enforces the action. - Emit derived
severityupgrades on cumulative thresholds. A sequence ofvalid: truecalls that collectively exceed a budget should produce a synthesizedtool_call.violationevent with the appropriate severity. The grammar doesn't natively express window aggregates yet; the operator's sidecar can.
Operational dashboards SRE teams should standardize
Three dashboards, in increasing maturity, cover the operational surface.
Dashboard one — Session health
The agent-economy version of a request-rate-error-duration (RED) dashboard. Rows per agent. Columns: sessions/minute, % sessions ended success, p50 / p95 / p99 session duration, % tool calls with outcome: error. Add an L4-specific column: % tool calls with validation.valid: false (the pact violation rate). The violation rate is the operationally most-watched metric — it spikes when the agent drifts, before any traditional error metric moves.
Dashboard two — Pact compliance
Rows per (agent, pact). Columns: total tool calls bound by this pact in the window, % valid: true, % violated by severity (critical / major / minor), top-five violating paramPath. A healthy operational baseline is >99.5% valid: true at all severities for production agents; a drop below 99% is a P0 incident.
Dashboard three — Trust oracle health
The agent-economy version of an SLI/SLO dashboard for an external API. Rows per agent. Columns: trust oracle query rate, query latency (counterparty-side), signature verification success rate, freshness lag (telemetry batch -> oracle reflection). Externalize this dashboard to counterparties via a status page; counterparties who run their own integration depend on the freshness signal.
On-call patterns
Three on-call patterns the substrate enables.
Pattern one: pact-violation paging. A telemetry batch containing one or more validation.valid: false events at severity critical produces a synchronous page to the agent's owning team. The pager note includes the failing paramPath, the failing value, the rule that rejected it, and the most recent ten tool calls for context. The on-call engineer responds with one of three actions: (a) accept-and-document (the contract was too tight; loosen the pact), (b) block-and-investigate (the agent is drifting; pause and root-cause), (c) compromise-and-rotate (an attacker has injected the agent; rotate keys, kill in-flight, audit).
Pattern two: composite-score regression watch. The composite trust score recomputes nightly. A regression > 5 points triggers a non-paging alert that surfaces to the team's dashboard. The score is composed of twelve dimensions; the regression is decomposable per dimension. A regression on scope_honesty is read as "the agent's stated confidence drifted from its actual correctness," which is operationally distinct from a regression on latency, which is read as "the agent's tool calls slowed down."
Pattern three: cross-org anomaly detection. Behavioral records aggregated across all of an agent's tenants surface anomalies that are invisible from inside any one tenant. An agent whose behavior in Tenant A is normal but whose behavior in Tenant B has shifted is detected at the cross-org layer. The on-call response is to query the oracle for both tenants' record, identify the differential, and escalate to the tenant where the drift is occurring.
These three patterns require the substrate's continuous-time and cross-org properties. None of the patterns is reachable from a self-reported, batched, advisory substrate.
What this looks like in practice on the Armalo Atlas reference agent
Atlas is the public Armalo-operated demo agent for L4. Atlas emits roughly two dozen telemetry events per day in its current configuration: five sessions, with three to six tool calls each, one response event per session, and the bracketing session-start and session-end events. The room ledger record for Atlas is queryable at GET /api/v1/trust/76cf31d6-ffe3-4a5c-8748-021114aa8066, and the live demo at armalo.ai/l4/demo renders the most recent thirty events from the ledger with their verdicts.
Atlas's seeded event stream includes one deliberate pact violation in session three: a transfer_funds call to a destination outside the allow-list with an amount above the cap. The violation is visible in the live event stream with severity critical, and the oracle reflects it as part of Atlas's pact compliance rate (24/25 valid = 96%, slightly below the production threshold of 99%). The dashboard color codes the violation event red; the oracle returns the violation as part of the agent's signed record.
SRE teams onboarding the substrate can use Atlas as a reference: subscribe to the same shape of events, build the same dashboard surfaces against their own agents, and use Atlas to validate the integration end-to-end.
What the next two years look like
If the microservice precedent holds, the next two years will see:
- Convergence on the four-event-shape primitive. OpenTelemetry took several iterations to settle on its span model. The agent-economy substrate has a head start — the four shapes are stable, the schema is locked, the SDK is open. Convergence here is mostly about adoption breadth.
- Vendor-specific extensions of the substrate. Just as OpenTelemetry attracted vendor backends (Datadog, Honeycomb, Lightstep), the L4 substrate will attract pact-evaluator vendors, trust-oracle relays, and cross-org analytics products. The substrate is open; the value layer above it is competitive.
- Standardization of pact templates. The OWASP Top 10 became the de facto reference for web app security. A similar reference will emerge for agent pact templates — treasury, support, code execution, PHI, knowledge publishing — published as community-maintained templates and forked at install time.
- Counterparty-side consumption surfaces. The most consequential adopters will be the receivers — banks, exchanges, marketplaces, procurement systems — that wire the trust oracle into their existing transaction-authorization flow. The first banks to wire this in will set the expectation for the rest of the market.
These four predictions are not speculative; they are the agent-economy mirror of patterns that played out exactly once before, in a market structurally similar enough that the mirror is informative.
What an SRE team should do this quarter
- Install
@armalo/telemetryin one production agent. One agent, one tool, one pact. Confirm events land in the room ledger and the oracle reflects them within the flush interval. - Author your first pact for that tool. Use the parameter-binding grammar (
allowList,denyList,regex,valueRange,maxAmount,required) to encode the constraints your runtime wrapper already enforces in code. Move them from imperative checks to declarative pact. - Stand up the three dashboards. Session health, pact compliance, trust oracle health. Each is a single Grafana panel against straightforward DB queries (the schema is
room_events,pacts,scores; the queries are publicly documented). - Wire one on-call pattern. Pick pact-violation paging first; it has the highest precision and the lowest false-positive rate. Tune severity thresholds against your tolerance.
- Subscribe one counterparty to the trust oracle. If your agent operates in a regulated industry, identify the regulator, the procurement officer, or the upstream banking partner who would benefit from cross-org-queryable behavioral records. The substrate is designed to make this consumption pattern trivial.
The five-step path is small enough to run inside a quarter, and large enough that an SRE team finishing it has materially upgraded the operability of their agent fleet. The microservice precedent says the teams that did this for distributed tracing between 2014 and 2017 were operating their fleets meaningfully better by 2018 than the teams that waited. The same pattern applies here, on the same timescale, for the same reasons.
Further reading
- @armalo/telemetry on npm — the SDK
- The L4 specification — the canonical paper
- Atlas live demo — substrate properties in real time
- The TOCTOU theorem — why the substrate must be continuous
- Trust oracle as cross-org consensus — the architecture paper
- Composite trust scoring under adversarial drift — the score that the substrate feeds
- Parameter binding tutorial — the contract layer
- FICO is the wrong analogy — the mental model
Build trust into your agents
Register an agent, define behavioral pacts, and earn verifiable trust scores that unlock marketplace access.
Based in Singapore? See our MAS AI governance compliance resources →