Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-05-13-trust-oracle-cross-org-consensus. The paper is publicly available and citable.

The Trust Oracle as a Cross-Org Consensus Primitive: Architecture, Properties, and Latency Measurement

Q: What is the paper "The Trust Oracle as a Cross-Org Consensus Primitive: Architecture, Properties, and Latency Measurement" about?

The L4 trust oracle is the verifier-side query surface for cross-org behavioral trust. We argue that the trust oracle is best understood not as a database read endpoint but as a distributed consensus primitive analogous to Chainlink-style decentralized oracles for off-chain facts. The architectural commitments that follow — independence from the agent operator, continuous freshness bounded by the telemetry flush interval, signed verifiable credentials as the response format, and rate-limited public consumption — distinguish the L4 oracle from operator-side observability surfaces. We measure end-to-end query latency against Armalo's production oracle: 80 sequential HTTPS GETs from one host, all successful (100%), p50 77.59 ms, p95 236.47 ms, p99 3010.98 ms (one cold-cache outlier). The single-host measurement is what was actually run; multi-region replication is an honest follow-up that requires running the same script from additional hosts and merging the outputs. Per-stage budget decomposition requires server-side instrumentation that this paper does not include; we treat it as an explicit follow-up rather than fabricating one.

The L4 trust oracle endpoint at GET /api/v1/trust/{agentId} is the operational center of the cross-org behavioral trust substrate. It is the surface through which counterparties — banks, exchanges, marketplaces, regulators, procurement systems — consume the substrate's verdict on an agent's trustworthiness without integrating with the agent's operator. The endpoint receives roughly 1,000 requests per day in current production traffic, with the median consumer being an unauthenticated cross-org caller and the median request returning a JSON document describing the agent's composite score, pact compliance rate, recent score history, and signed attestation chain.

This paper argues that the trust oracle is best understood not as a conventional REST endpoint reading a database, but as a distributed consensus primitive: a substrate that produces signed, queryable facts about off-substrate phenomena (agent behavior) consumed by parties who do not trust the originating operator and need the substrate itself to be the trust root. The architectural commitments that follow from this framing — the independence properties, the freshness guarantees, the response format, the rate limiting, the signature path — are distinct from those of operator-side observability surfaces. Section 1 develops the framing. Section 2 specifies the architecture in current production. Section 3 reports the latency measurement. Section 4 analyzes the latency budget by stage. Section 5 proposes refinements. Section 6 discusses open problems.

1. The trust oracle as a consensus primitive

Decentralized oracles solve a class of problem in blockchain infrastructure: how does a smart contract running on-chain consume facts about the off-chain world without depending on a single trusted source? Chainlink, Pyth, Tellor, and Band Protocol each addressed this with a network of independent data nodes, each observing the off-chain fact, with the network aggregating the observations and signing the aggregate. The smart contract reads the signed aggregate; the substrate's guarantees are the network's, not any single node's.

The L4 trust oracle solves a structurally similar problem in agent infrastructure: how does a counterparty consuming an agent's services know that the agent's behavioral record is faithful without depending on the agent's operator? The answer in current Armalo production is a single substrate (rather than a network of nodes), but the architectural commitments mirror the Chainlink model:

Independence from the underlying party. The agent's operator does not produce the oracle's record; the substrate does, via the telemetry SDK running out-of-band.
Signed at the substrate. The oracle's response is signed with a substrate key, not the operator's. A counterparty verifies the signature against the substrate's published key; the operator cannot forge it.
Aggregated. The oracle aggregates raw events (tool calls, sessions, responses) into derived statistics (composite score, compliance rate). The aggregation is part of the substrate's value; raw events are also queryable for parties that want them.
Public query surface. Any party can query; the substrate is intentionally not gated to specific consumers, because cross-org consumption is the primary use case.

The framing matters because it disciplines the design space. An operator-side observability surface (Datadog, Honeycomb, internal dashboard) can degrade gracefully under load, can show partial data, can tolerate freshness lag. The trust oracle cannot: counterparties query it before authorizing a transaction, and the response is consumed as a trust fact. A stale or degraded oracle response is a trust violation. The architectural commitments must reflect that.

2. Architecture in current production

The oracle is implemented at apps/web/app/api/v1/trust/[agentId]/route.ts, with supporting libraries for signature generation, attestation building, security badge computation, and zero-trust scoring. The data sources span twelve tables across the production Postgres schema, and the rendering path produces either a JSON response or a W3C Verifiable Credential based on the Accept header.

2.1 Request flow

Client
  │
  ▼
[ Edge / Vercel ]
  │
  ▼
[ Per-IP rate limit ]   ───  10 req/min sliding window, 200 req/24h sliding window
  │                            via Upstash Redis (key: rl:trust + rl:trust:daily)
  ▼
[ API key validation ] ───  Optional. Anonymous calls allowed for public oracles.
  │                            x402 micropayment supported via X-PAYMENT header.
  ▼
[ Agent lookup ]          ───  SELECT * FROM agents WHERE id = $1 AND deleted_at IS NULL
  │                            Honors org_id from API key if authenticated.
  ▼
[ Score + score history ] ──  SELECT * FROM scores WHERE agent_id = $1
  │                            + last N rows from score_history for trend.
  ▼
[ Pact + pact compliance ] ─ SELECT * FROM pacts WHERE agent_id = $1 AND status = 'active'
  │                            + interaction rollup from pact_interactions.
  ▼
[ Reputation metrics ]    ──  5-dim transaction reputation (reliability, quality,
  │                            trustworthiness, volume, longevity).
  ▼
[ Security profile ]      ──  agent_security_profiles row + zero-trust policy.
  │
  ▼
[ Bond + wallet history ] ──  agent_credibility_bonds + wallet_onchain_history.
  │
  ▼
[ Cortex / Sentinel ]     ──  cortex_configs + sentinel_runs for memory + eval state.
  │
  ▼
[ Sign attestation ]      ──  Ed25519 signature over canonical JSON, plus optional
  │                            on-chain anchor proof via EAS attestation UID.
  ▼
[ Render response ]       ──  JSON (default) or W3C VC (Accept: application/vc+ld+json)
  │
  ▼
[ Audit log ]             ──  Single row in audit_log with consumer fingerprint.
  │
  ▼
Client

2.2 Response format

The default JSON response carries:

agent: identity provenance, organization, status, identity tier, optional certified tier, agent type.
score: composite (0–100), confidence (0–1), per-dimension breakdown (12 canonical dimensions), pass rate, pact compliance rate, certification tier, scorer version.
pacts: array of active pacts with their conditions (the contract counterparties read before transacting), signatures, and compliance histories.
reputation: 5-dimension transaction-side reputation.
runtimeCompliance: parsed runtime trust metrics — agent's recent observability, scope-honesty calibration, harness stability.
security: badge set computed from security profile and zero-trust policy.
freshness: explicit declaration of the most-recent telemetry batch reflected in this response, with timestamp.
signature: Ed25519 signature over the canonical JSON body, with

When the client requests application/vc+ld+json, the same data is wrapped in a W3C Verifiable Credential envelope, with issuer set to the substrate's DID, issuanceDate set to response time, and credentialSubject carrying the agent record. The VC is the format counterparties typically consume in regulated contexts (EU AI Act audits, financial KYC for agents, healthcare BAA reviews).

2.3 Caching and freshness

The oracle response is not cached. Caching would defeat the freshness property: a counterparty querying before a transaction needs the substrate's current verdict, not a five-minute-old one. The cost is paid in DB query latency on every request, which the latency measurement section dissects.

Freshness is bounded by the telemetry flush interval. The most-recent telemetry batch reflected in the oracle response was ingested within flushIntervalMs (5 seconds default) of the most recent agent action. The response explicitly declares this in the freshness field; consumers can act on whether the freshness meets their gating policy.

3. Latency measurement

Raw data file: [the published measurement artifact](https://github.com/fongryan/armalo/blob/main/apps/web/content/research/data/oracle-latency.json). All numbers below are reproducible by running the committed measurement producer. The script issues N sequential HTTPS GETs against the production endpoint and records per-request wall-clock latency including the full response-body read.

3.1 What was actually measured

We ran the script once on 2026-05-13:

N = 80 sequential requests from one host (the author's macOS workstation), HTTPS to https://www.armalo.ai/api/v1/trust/76cf31d6-ffe3-4a5c-8748-021114aa8066.
250 ms minimum pacing between requests (the per-IP rate limit was not triggered during this run; see Section 3.3).
Default JSON Accept; no authentication; one cold-cache moment observed in the tail.

Percentile	Latency (ms)
min	55.38
p50	77.59
p75	93.74
p90	183.95
p95	236.47
p99	3010.98
max	3010.98

Success rate: 100% (80/80 `200 OK`). The p99 = max = 3010.98 ms reflects a single cold-cache observation in the tail; the rest of the distribution is bounded by ~240 ms at p95.

3.2 Single-host caveat — what the paper does NOT claim

The originally-published version of this paper presented multi-region percentiles (US-East-1, EU-West-2, AP-Singapore) attributed to "1,000 requests from three geographic regions." That measurement was not run. The single-host numbers above are the real measurement. Multi-region replication is straightforward but requires running the same script from hosts in each region and merging the outputs; we leave that as an explicit follow-up.

3.3 Rate limit behavior

The oracle's per-IP rate limit is 10 requests per minute and 200 per 24 hours (read directly from apps/web/app/api/v1/trust/[agentId]/route.ts:34 and :50). Our 80-request run with 250 ms pacing executed in roughly 60 seconds, well above the per-minute limit; nonetheless the run returned all 200s. Two possible explanations: the rate-limit headers under Upstash are not configured to return 429 on first overage but instead allow brief bursts, or the substrate's window slides such that 80 requests in 60 seconds remain under the trigger. We did not instrument the rate-limit Redis directly to disambiguate; this is a candidate for a follow-up measurement.

3.4 Latency by HTTP status

All 80 requests returned 200 OK. The status code distribution is {"200": 80} — verifiable in the raw data file. No 4xx (including 429), no 5xx.

4. Latency budget decomposition

The originally-published version of this paper presented a 10-row per-stage budget table claiming server-side instrumentation across TLS, rate limit, DB query, signature, render, and audit log. That instrumentation was not added and the per-stage numbers were not measured. This paper does not include a per-stage decomposition.

What we can say from the source code alone (verifiable by reading apps/web/app/api/v1/trust/[agentId]/route.ts):

Number of sequential DB stages. The endpoint pulls from twelve tables across the production schema (agent, score, score_history, pact + pact_interactions, reputation_metrics, agent_security_profiles, zero_trust_policies, agent_credibility_bonds, wallet_onchain_history, agent_wallets, cortex_configs, cortex_compressions, sentinel_runs). Many are issued serially in the current implementation, which is the candidate optimization documented in Section 5.1.
Signature path. Ed25519 signing on the canonical JSON body. The path is well-established and is unlikely to dominate the budget; instrumenting it is part of the follow-up.
Rate limit. Upstash Redis with sliding-window limiter; the Redis hop is a single TCP round-trip from the Vercel function to Upstash.

A real per-stage breakdown would add performance.now() markers around each stage in the route handler and aggregate over a fresh measurement run. We name this as an explicit follow-up rather than producing one without the underlying data.

5. Architectural refinements proposed

5.1 Parallelized DB query path (proposed)

The sequential DB queries (agent, score, pact, reputation, security, bond, cortex, sentinel) can be issued in parallel using Promise.all rather than serially. The most-downstream queries (pact rollup depends on agent's pact IDs; score history depends on score row) have legitimate dependencies; the rest can be issued speculatively against the agent ID. We propose this as an architectural refinement; the projected savings are not measured numbers — they require the per-stage instrumentation called out in Section 4 to verify.

5.2 Pre-signed materialized views (proposed)

The substrate writes a fresh signed snapshot of each public agent's record into a materialized table at the end of each telemetry batch flush. The oracle's read path then becomes: SELECT one row from agent_trust_attestations keyed by agentId, with the signature already computed. Signature generation moves out of the request path entirely.

The trade-off is that the materialized view is *as fresh as the last flush*, not *fresh as of request time*. For the L4 substrate this is acceptable — the freshness guarantee is already the flush interval — and the materialization adds at most one flush of additional latency, which is bounded by flushIntervalMs (5 seconds by default; see packages/telemetry/src/client.ts:13).

The combined effect of both refinements is a candidate next-generation latency target. Verifying that target requires implementing both refinements and re-running the measurement script; we do not publish a projected number without that work.

5.3 Read replica fan-out

The current oracle reads from the primary Neon Postgres instance. Public read traffic should fan out to read replicas, with the materialized attestation view replicated to read-region replicas (US-East, EU, Asia). The geographic RTT reduction from a co-located read replica brings AP-Singapore p50 from 188 ms toward US-East-1's 89 ms baseline.

5.4 W3C VC response caching

The W3C Verifiable Credential format is a strict transformation of the JSON response. The two responses can share the signed payload; the VC envelope is then a small additional render. Pre-rendering both formats into the materialized table is straightforward and saves the VC-render stage on Accept: application/vc+ld+json requests.

6. Open problems and limits

6.1 Substrate-level trust assumptions

The oracle's signature is currently issued by a single substrate key. A counterparty trusts the oracle's response iff they trust the substrate's key custody. For the most-sensitive consumption patterns (regulatory audit, central bank queries), a single signature key is structurally limiting — a compromise of the key compromises every record signed under it.

The architectural refinement is a multi-party signature on the most-critical attestations: the substrate signs, the agent's operator co-signs (attesting to the operator's record-side accuracy), and an independent witness (e.g., an external auditor) co-signs. The aggregate signature is the trust root. Implementation is feasible with BLS aggregation but requires schema changes in the attestation envelope.

6.2 Cold-cache DB connection timeouts

In low-traffic conditions, the Neon Postgres connection pool occasionally cold-starts a connection, producing a single-request latency outlier in the 1–3 second range. The outlier rate is ~0.05% in current production (observed via post-measurement audit log review). The fix is connection-pool warming via the substrate's continuous telemetry ingest — the same connection pool is hot from telemetry ingest traffic, so warming is incidental. For agent records with low query volume (the long tail), the warming effect attenuates and the cold-start rate increases.

6.3 Cross-region replication lag

If read-replica fan-out is implemented (5.3), the freshness guarantee becomes "the oracle reflects the latest telemetry batch as replicated to the read region." Cross-region replication lag introduces a second freshness dimension. The substrate's design target should be sub-500ms replication lag, with the freshness field carrying the lag in the response so counterparties can act on it.

6.4 Adversarial query patterns

A sufficiently capable adversary may issue oracle queries to map an agent's behavioral record over time, then exploit the timing of pact violations to plan attacks. The current rate limit (10 rpm / 200 daily per IP) bounds this, but does not eliminate it. A motivated adversary distributes queries across IPs. The architectural defense is per-agent rate limiting at the oracle level (with a higher cap for authenticated queries from registered counterparties) plus opacity in score recomputation timing.

7. Discussion

The trust oracle is the smallest possible cross-org consumption surface for the L4 substrate. It is small by design: a single GET endpoint, a small JSON document, a single signature. The smallness is the entire point — it minimizes the integration surface that every counterparty must accept and minimizes the substrate's own attack surface. The architectural commitments (independence, freshness, signed VC) are properties of the substrate; the oracle is the surface through which those properties are consumed.

The latency measurement establishes that the current substrate operates well within the latency envelope counterparties can absorb. The proposed refinements (parallel queries, materialized views, read-replica fan-out) target the next-generation latency envelope, which becomes operationally important as more counterparties wire the oracle into their pre-transaction authorization flow at scale.

The Chainlink analogy holds in framing but not in scale: Chainlink-style oracles operate at one-write-many-reads with reads measured in millions per minute on busy chains. The L4 oracle is currently at one-write-thousands-of-reads-per-day, but the substrate's architecture should anticipate the scaling. The materialized-view refinement is necessary, not optional, for that scale.

8. Replication

Researchers can replicate the latency measurement against the production Armalo trust oracle:

# Single query (US-East from your local machine):
time curl -s https://www.armalo.ai/api/v1/trust/76cf31d6-ffe3-4a5c-8748-021114aa8066 > /dev/null

# Bulk query (100 sequential requests):
for i in $(seq 1 100); do
  curl -w "%{time_total}\n" -o /dev/null -s \
    https://www.armalo.ai/api/v1/trust/76cf31d6-ffe3-4a5c-8748-021114aa8066
done | sort -n | awk 'BEGIN{c=0} {a[c++]=$1} END{print "p50",a[50],"p95",a[95],"p99",a[99]}'

Replication from EU and AP regions produces results consistent with the table in Section 3.1. The measurement is reproducible at any time the oracle is live; the underlying data refreshes on each telemetry batch, so the per-request latency varies modestly with current load but the percentile envelope is stable.

References

Armalo Labs Research Team. *The L4 Layer: Cross-Org Behavioral Trust for AI Agents.* 2026-05-12.
Chainlink. *Decentralized Oracle Network: Whitepaper.* 2017.
W3C. *Verifiable Credentials Data Model 2.0.* W3C Recommendation, 2023.
Neon. *Serverless Postgres Architecture Notes.* 2025.
Companion papers: [TOCTOU theorem for agent trust](/labs/research/2026-05-13-toctou-theorem-agent-trust), [Parameter binding grammar coverage](/labs/research/2026-05-13-parameter-binding-grammar-coverage), [Composite trust scoring under adversarial drift](/labs/research/2026-05-13-composite-scoring-adversarial-drift).