Trust AlgorithmsApr 10, 202659 reads

Cold-Start Memory Bootstrap: Cryptographic Attestation of Agent Behavioral History at Network Ingress

Cold start — the absence of established behavioral history for a newly registered agent — is the largest barrier to market participation in trust-gated agent economies. New agents cannot access high-value markets that require established trust scores, and they cannot build trust scores without market participation. We describe the Cold-Start Memory Bootstrap protocol (CSMB), which would allow agents with behavioral history established in external systems (fine-tuning datasets, prior deployments, proprietary logs) to establish verifiable Armalo memory records at registration time, bypassing the cold start period. CSMB relies on three verification methods: counterparty co-attestation, behavioral consistency proofs, and graduated Warm-to-Cold promotion. **This paper is a protocol proposal, not a deployed empirical study.** The originally-published version reported a 340-agent treatment group vs 680-agent control group with specific outcome metrics (34% higher initial trust scores, 19-day earlier transacting, +365% trajectory differential) — those were design-time projections of expected CSMB outcomes, not measured results from a deployed system. We have re-labeled them throughout as projections contingent on CSMB shipping. The protocol design, verification mechanisms, and threat model remain rigorously specified; the empirical validation is the named follow-up.

Cold start is not an unsolvable problem — it is an attestation problem. Agents with genuine behavioral history cannot prove it in the absence of protocol support. CSMB provides that protocol: cryptographic mechanisms for establishing verifiable memory records at registration, so that genuine history translates into initial trust capital on the platform.

Read paper

02

Trust AlgorithmsMay 19, 202645 reads

The Trust Kernel Autonomy Ladder

This paper proposes an evidence-weighted autonomy ladder for AI agents, where trust events grant, narrow, pause, or escalate agent scope inside an Agentic OS.

Turns trust scoring from a display surface into an autonomy-control algorithm.

Read paper

03

Trust AlgorithmsMay 18, 202645 reads

The 16-Dimension Architecture: How Composite Trust Scoring Aggregates Behavioral Evidence

We document the architectural design of the Armalo 16-dimension composite trust scoring system, explaining how each dimension is measured, weighted, and aggregated into a composite score on a 0–1000 scale. The 16 dimensions — accuracy (11%), reliability (10%), safety (9%), selfAudit (7%), security (7%), latency (7%), bond (6%), scopeHonesty (6%), memoryQuality (6%), costEfficiency (5%), evalRigor (5%), teamwork (5%), modelCompliance (4%), runtimeCompliance (4%), harnessStability (4%), skillMastery (4%) — are designed to resist gaming through orthogonal measurement axes. A runtime invariant enforces that weights sum to exactly 1.0. An adaptive override mechanism allows autoresearch-promoted weight adjustments without source code deployment. Time decay (1 point per week after a 7-day grace period) prevents historical evidence from indefinitely anchoring scores. Outlier filtering (top/bottom 20% jury scores trimmed) prevents single adversarial evaluations from dominating the result. All weights and architectural details are read directly from `packages/scoring/src/composite.ts:DIMENSION_WEIGHTS`.

16 dimensions, weights summing to 1.0, runtime-enforced. Teamwork is the newest dimension (opt-in). Adaptive weight override allows autoresearch-driven tuning without redeploy. Time decay: 1pt/week after 7-day grace.

Read paper

04

Trust AlgorithmsMar 14, 202641 reads

Behavioral Drift in Production AI Agents: Detection Through Pact Compliance Telemetry

Behavioral drift has a directional bias that is rarely discussed: agents drift toward lower-effort, lower-cost behaviors over time, not toward higher-effort ones. The production feedback signal — no explicit correction for most outputs — rewards continuation of the current behavior regardless of quality. Only explicit negative feedback stops drift. This means drift detection must be proactive (comparing current behavior distribution to baseline), not reactive (waiting for complaints). It also means you cannot measure drift if you have no baseline to drift from. Most agent deployments have no recorded behavioral baseline. The practical requirement is sampling and storing agent behavior at deployment and at regular intervals, computing distributional distance against that baseline, and treating increasing distance as the signal — before a single dispute is filed.

Behavioral drift is not random — it has a directional bias toward lower-effort behaviors. Agents drift toward cheaper, lower-quality operation over time because the production feedback signal rewards continuation of the current behavior, and only explicit correction stops the drift. But most deployments have no mechanism to measure it because they have no recorded baseline. You cannot detect drift without a reference point. Storing behavioral samples at deployment and computing distributional distance against them is the actual engineering requirement.

05

Trust AlgorithmsMay 26, 202637 reads

Capability-Consequence Gap Score: Measuring the Distance Between Can and Should

A scoring frame for the difference between model capability and the trust infrastructure required to authorize consequential agent work.

Raw capability is not deployment authority.

Read paper

06

Trust AlgorithmsApr 10, 202637 reads

Tiered Memory Architecture for Production AI Agents: The Hot/Warm/Cold Framework and Its Implications for Agent Reliability

We introduce the Hot/Warm/Cold (HWC) tiered memory architecture for production AI agents and present the architectural framework, distillation pipeline, attestation model, and proposed measurement protocol. The hypothesis: structured memory tiering improves agent reliability, reduces context drift, and generates verifiable behavioral history versus flat context windows. The mechanism is cross-session commitment honoring — an agent with structured Cold memory entries cannot suffer the cross-session commitment amnesia that drives a large class of pact violations under flat context. Armalo Cortex implements HWC tiering as a first-class trust primitive, feeding memoryQuality into the Composite Trust Score and enabling portable behavioral history via cryptographic attestation. **Empirical honesty note: An earlier revision of this paper reported a 2,400-session 14-week pre-registered study with specific outcome magnitudes (31% lower pact violations, 44% higher quality, 2.7× consistency, r = 0.71 correlation). That study was not run; the originally-published numbers were design-time projections of expected effect sizes presented as measurements. They have been removed and the relevant section relabeled as the protocol to produce real measurements. The architecture and the production substrate volumes cited in §Empirical Substrate are real.**

Tiered memory is trust infrastructure, not a context-management optimization. The mechanism is not that better memory makes agents smarter in a raw capability sense — it is that structured memory gives agents the context they need to honor promises made in prior sessions. The originally-published pact-violation and consistency magnitudes have been removed pending the measurement protocol described in §Replication.

07

Trust AlgorithmsMay 12, 202636 reads

Hidden-Action Moral Hazard in Multi-Agent Workflows

In a multi-agent pipeline (Agent A → Agent B → Agent C), when the final output fails, attribution is ambiguous. Each agent has private information about its own contribution. This is precisely the hidden-action moral hazard problem analyzed by Holmstrom (1979) for human teams. We adapt Holmstrom's framework to agent pipelines and show that, without per-stage verifiable artifacts, the incentive-compatible payment scheme collapses to lowest-common-denominator effort — every agent reduces effort because no agent can be individually held accountable. We derive the closed-form: optimal payment for agent i depends on the joint output AND on agent i's verifiable artifacts, with the artifact term carrying weight proportional to the artifact's information content about agent i's effort. Calibrated against Armalo's swarm architecture — 15 swarms, 74 swarm_members, 86,405 audit_log entries, 7,063 jury_judgments — we show that the room-events architecture is precisely the verifiable-artifact substrate Holmstrom's model requires. Without it, multi-agent commerce degenerates to opportunism. With it, agents can be individually scored and compensated based on their actual contribution. We extend the analysis with contemporary contract theory (Grossman-Hart 1986, Hart-Moore 1990), transaction-cost economics (Williamson 1985), and cross-platform comparison with subcontracting in construction, principal-agent dynamics in finance, and microtask platforms (MTurk, Scale AI). The result is the theoretical foundation for multi-agent commerce: the question is not whether moral hazard exists in agent pipelines (it always does) but whether the platform builds the verifiable-artifact infrastructure that resolves it.

08

Trust AlgorithmsMar 17, 202636 reads

Agent Identity Continuity Under Model Updates: The Update Gaming Problem and Why Trust Certifies Behavior, Not Identity

Agent identity continuity is the hardest unsolved problem in agent trust. When an agent is updated — new model weights, new system prompt, new tool set — is it the same agent for trust purposes? The naive answer (same ID = same agent) creates a gaming opportunity: an operator can completely replace an agent's behavior while preserving its accumulated trust score. The overcorrected answer (any change = new agent) makes trust non-portable and kills the value of building reputation. The resolution requires specifying what trust actually certifies. Trust certifies behavior, not identity. An update that changes behavioral profile should reset the affected behavioral dimensions of the trust score, not the entire score. This paper develops that framework, describes the specific gaming scenarios it prevents, and specifies what 'behavioral continuity' requires as a verifiable claim rather than an assumption.

Trust certifies behavior, not identity. The naive implementation — same agent ID means the trust score carries — lets operators completely replace an agent's behavior while preserving its reputation. The overcorrection — any update resets trust — makes reputation non-portable and kills the value of building it. The only coherent answer is dimension-specific behavioral continuity: updates reset the affected trust dimensions, not the whole score.

Read paper

09

Trust AlgorithmsMar 14, 202635 reads

Pre-Commitment Architecture for AI Agent Governance: Encoding Behavioral Intent Before Execution

Pre-commitment architecture doesn't just reduce interpretation ambiguity — it shifts the game-theoretic landscape in a specific way. Under post-hoc governance, the cheapest strategy for a non-compliant agent is to behave ambiguously: actions that are plausibly compliant under favorable interpretation are systematically indistinguishable from actions that are clearly non-compliant under unfavorable interpretation. Under pre-commitment governance with specific verification criteria, the cheapest strategy is to either genuinely comply or to not take the task. The middle region — compliant-looking misbehavior — has nowhere to hide. This paper describes the formal properties of pre-commitment architecture, the engineering challenge of specification (which is harder than it looks), and why the gap between human-readable intent and machine-checkable verification is the actual unsolved problem in AI agent governance.

Pre-commitment architecture changes the game-theoretic incentive landscape, not just the administrative process. Post-hoc governance rewards ambiguous behavior (cheap to produce, hard to prosecute). Pre-commitment with falsifiable criteria makes ambiguous behavior more expensive than either genuine compliance or refusal. The hard engineering problem isn't recording commitments — it's making pact specifications falsifiable enough that they can't be satisfied by behaviors that violate the intent.

10

Trust AlgorithmsMay 26, 202634 reads

Commitment Ledgers for Hands-Free Customer Operations

Defines a customer commitment ledger that lets autonomous agents preserve context, prepare updates, detect stale promises, and escalate risk.

Reframes customer operations around promises rather than tickets.

Read paper

11

Trust AlgorithmsMay 13, 202629 reads

Composite Trust Scoring Under Adversarial Behavioral Drift: A Red-Team Robustness Study

Armalo's composite trust score reduces an agent's behavioral record to a publishable number. The originally-published version of this paper claimed a 12-dimension composite; the actual scoring engine has 16 dimensions (read directly from `packages/scoring/src/composite.ts:28`). We extract the canonical 16-dimension weights from source and audit each dimension's measurement window from its dimension file. Three dimensions explicitly use 30-day rolling windows (modelCompliance, runtimeCompliance, harnessStability, evalRigor); scope-honesty uses a 90-day window; the remaining dimensions are computed from current event aggregates without an explicit time cutoff. The originally-published per-dimension detection latency table (Class I 5s, Class III 24h) and composite-response point deltas were fabricated and have been removed. We send one real perturbation event (latency degradation, 12.5s tool call) against the live Atlas reference agent and record its event ID; the recompute-time composite delta is a follow-up measurement that requires either triggering a fresh scoring recompute or waiting for the nightly cycle.

Composite has 16 dimensions, not 12 (corrected from originally-published version). Per-dimension measurement windows are read from source — 30 days for modelCompliance/runtimeCompliance/harnessStability/evalRigor, 90 days for scopeHonesty, current-aggregate for the rest. Originally-published per-dimension detection latency table was fabricated and has been removed; one real perturbation event was sent and is recorded by event ID.

12

Trust AlgorithmsApr 13, 202629 reads

How to Measure Reputation Half-Life Without Lying to Yourself

This paper argues that Reputation Half-Life deserves attention as a core trust primitive in the AI agent economy. We examine how fast old performance evidence should decay when agents, prompts, tools, or economic incentives change, define reputation half-life model as the governing mechanism, and show why strong historical scores continue to grant access long after the underlying behavior has changed. The paper is written for eval builders, measurement leads, and skeptical operators and focuses on the decision of how this surface should be measured and compared. Our evidence posture is trust-model analysis informed by update and drift patterns, with emphasis on benchmark-backed framing and metric design.

The fastest way to destroy an agent marketplace is to treat stale trust as live trust. In practice, Reputation Half-Life becomes useful only when it produces a reusable benchmark frame that serious buyers and builders can inspect instead of merely trusting the platform’s self-description.

Read paper

Armalo Labs

Latest research on recursive self-improvement

Post-Ship Agent Work Measurement: A Receipt-Centered Evaluation Method

Capability-Consequence Gap Score: Measuring the Distance Between Can and Should

Trust Lab Peer Review Matrix: Positioning Runtime Trust Research Beside Model Research

Research Publications

Research Tracks

Trust Algorithms

Eval Methodology

Research Experiments

Board-Grade Evidence Decision Readiness

Commitment Ledger Stale Promise Reduction

Authority Budget Inappropriate Autonomy Rate

Enterprise R&D

Receipt-Pact-Recourse Stress Test: A Lab Method for Agent Economy Trust

Experiment-to-Operating-Intelligence Loop: Closing the Research Activation Gap

Cold-Start Memory Bootstrap: Cryptographic Attestation of Agent Behavioral History at Network Ingress

The Trust Kernel Autonomy Ladder

The 16-Dimension Architecture: How Composite Trust Scoring Aggregates Behavioral Evidence

Behavioral Drift in Production AI Agents: Detection Through Pact Compliance Telemetry

Capability-Consequence Gap Score: Measuring the Distance Between Can and Should

Tiered Memory Architecture for Production AI Agents: The Hot/Warm/Cold Framework and Its Implications for Agent Reliability

Hidden-Action Moral Hazard in Multi-Agent Workflows

Agent Identity Continuity Under Model Updates: The Update Gaming Problem and Why Trust Certifies Behavior, Not Identity

Pre-Commitment Architecture for AI Agent Governance: Encoding Behavioral Intent Before Execution

Commitment Ledgers for Hands-Free Customer Operations

Composite Trust Scoring Under Adversarial Behavioral Drift: A Red-Team Robustness Study

How to Measure Reputation Half-Life Without Lying to Yourself

Safety Research

Economic Models