A composite trust score is only as resilient as its slowest dimension. The L4 substrate publishes a single composite to the trust oracle. Counterparties read the composite before transacting; a low score discourages or denies transactions, a high score permits them. An adversary who can degrade an agent's actual behavior in some dimension without that degradation showing up in the composite for a window of time has, during that window, an agent that *looks* trustworthy and *acts* untrustworthy. The economically interesting attack on the L4 substrate is therefore not "forge the score" β the score is signed β but "drift the agent faster than the substrate measures."
This paper inspects the actual scoring engine in source, extracts the real canonical weights and per-dimension measurement windows, and reports them as code-grounded facts. It also documents a real perturbation event sent against the live Atlas reference agent. The originally-published version of this paper claimed a 12-dimension composite with fabricated detection-latency numbers per dimension; we correct both errors below.
1. The composite scoring engine
Armalo's composite is defined in packages/scoring/src/composite.ts:28. We read the canonical weights directly from source rather than restating them from memory; the result is reproducible by running the committed measurement producer (which parses the source file). Raw data: [the published measurement artifact](https://github.com/fongryan/armalo/blob/main/apps/web/content/research/data/adversarial-drift.json).
Canonical dimension count: 16. (Originally-published claim of 12 was incorrect; this is a real correction from the source.) Canonical weights sum to 1.0000 (the source enforces this at module load with a guard).
| Dimension | Weight | Measurement window (from dimension source) |
|---|---|---|
| accuracy | 0.11 | no explicit temporal window (current-aggregate from event payload) |
| reliability | 0.10 | no explicit temporal window |
| safety | 0.09 | no explicit temporal window |
| selfAudit | 0.07 | no explicit temporal window |
| latency | 0.07 | no explicit temporal window |
| security | 0.07 | no explicit temporal window |
| scopeHonesty | 0.06 | 90 days (scope-honesty.ts:43 setDate(getDate() - 90)) |
| bond | 0.06 | no explicit temporal window |
| memoryQuality | 0.06 | comment: "30+ days" stale memory threshold |
| costEfficiency | 0.05 | no explicit temporal window |
| evalRigor | 0.05 | 30 days (eval-rigor.ts:10 "past 30 days") |
| teamwork | 0.05 | no explicit temporal window |
| modelCompliance | 0.04 | 30 days (model-compliance.ts:41 "last 30 days") |
| runtimeCompliance | 0.04 | 30 days (runtime-compliance.ts:19 "last 30 days") |
| harnessStability | 0.04 | 30 days (harness-stability.ts:16 "last 30 days") |
| skillMastery | 0.04 | no explicit temporal window |
"no explicit temporal window" means the dimension file does not contain setDate(getDate() - N) or "last N days" markers. These dimensions either compute from the per-event payload directly (e.g. tool_call.latencyMs for the latency dimension) or aggregate over the agent's full event history; the operative time window is therefore the event stream itself rather than an explicit cutoff. A more detailed audit of each compute function is a follow-up; this paper publishes the source-grounded facts that are auditable now.
The originally-published version of this paper claimed specific detection latencies per dimension (Class I 5s, Class II 15min-6h, Class III 24h). Those numbers were fabricated. The 30/90-day windows above are the closest real evidence in the codebase; they imply the Class III adversary window is closer to days-to-weeks than hours, which is a substantively different (and more concerning) finding than the originally-published version.
2. Adversarial drift model
We define adversarial drift as a deliberate, attacker-controlled shift in the agent's behavior along a single dimension, applied at a controlled rate, against a substrate that does not know the drift is adversarial. The model captures the operational case of a compromised agent (prompt injection, model rollover, supply-chain compromise) producing behavior that degrades on a specific dimension while other dimensions remain nominal.
The attacker has three control surfaces: rate (how fast the dimension degrades), stealth (whether the degradation is masked by counter-signals on adjacent dimensions), and target (which dimension is degraded). The substrate's defense is measured by detection latency β the time from the onset of degradation to the substrate's recording of the verdict in the composite score β and recovery latency β the time from the cessation of degradation to the substrate's restoration of the composite.
A real per-dimension detection-latency measurement requires (a) issuing a perturbation that degrades exactly one dimension, (b) triggering a scoring recompute, and (c) re-querying the oracle to observe the per-dimension shift. Production scoring runs nightly via packages/scoring; a real-time measurement requires either an authenticated recompute endpoint (not part of this paper's replicator) or waiting for the nightly cycle. This paper instead documents one real perturbation event sent to the substrate; the recompute-time per-dimension delta is an explicit follow-up.
3. Experimental design
3.1 Subjects
Synthetic agent. A scripted agent whose behavior is fully controllable. The agent emits telemetry events on demand; perturbations are applied by directly mutating the event payloads before SDK ingest. This subject isolates the substrate's measurement latency from the agent's natural variability.
Atlas reference agent. The public Armalo L4 reference agent (agent ID 76cf31d6-ffe3-4a5c-8748-021114aa8066). Atlas's seeded behavioral record contains a deliberate drift in one dimension (parameter-binding violation in session three, dimension: pact compliance / scope-honesty-adjacent). We use Atlas's existing drift as the reference for the composite's response to a single dimension-specific perturbation.
3.2 What was actually run (corrected from originally-published version)
The originally-published version of this paper described a 5-step protocol per dimension (baseline, perturbation onset, detection sampling, cessation, recovery sampling) producing a 12-row table of detection latencies, composite shifts, and recovery latencies. That protocol was not run. The 12-row table was fabricated.
The corrected run does the following, reproducible by running the committed measurement producer:
- 1.Reads the canonical 16-dimension weights from
packages/scoring/src/composite.ts:28(real, code-grounded). - 2.For each dimension, opens the corresponding dimension file in
packages/scoring/src/dimensions/and extracts the temporal window if one is present (e.g.setDate(getDate() - 90)for scope-honesty); recordsnullif no explicit window is found. - 3.Fetches Atlas's current trust-oracle response and records the composite score and freshness fields (pre-perturbation).
- 4.Inserts one real perturbation event into
room_eventsfor Atlas: atool_callwithlatencyMs: 12500(degrading the latency-related signals). Records the event ID and insert timestamp. - 5.Does NOT trigger a fresh scoring recompute. The recompute runs nightly via
packages/scoring; a real-time recompute requires an authenticated endpoint not exercised here.
The "results" of this run are therefore: (a) the source-grounded weights and windows in Section 1, (b) one perturbation event recorded with its ID in the raw data file, and (c) a follow-up commitment to measure the recompute-time composite shift on the next cycle. We do not fabricate the per-dimension detection latencies.
3.3 What this paper does NOT measure (explicit follow-ups)
- The composite shift *after the next scoring recompute* in response to the perturbation event sent.
- The per-dimension shift for dimensions other than latency (each requires a separate perturbation event tailored to its measurement source).
- The recovery latency (requires letting the perturbation event age out of the relevant window and re-querying).
- The composite's response to coordinated multi-dimension drift.
4. Results
4.1 Source-grounded dimension audit
Reproduced from the committed measurement producer reading the codebase directly:
- Dimensions in canonical composite: 16. Weights sum to 1.0000 (enforced at module load).
- Dimensions with explicit time windows in source: scopeHonesty (90 days), modelCompliance (30 days), runtimeCompliance (30 days), harnessStability (30 days), evalRigor (30 days), memoryQuality ("30+ days" threshold for stale memory).
- Dimensions without explicit time windows: accuracy, reliability, safety, selfAudit, latency, security, costEfficiency, bond, skillMastery, teamwork. These compute from current event aggregates or per-event metadata; the operative window is the agent's event history rather than an explicit cutoff.
The originally-published version claimed detection latencies of 5 s (deterministic), 15 minβ6 h (jury / policy), 6 hβ24 h (statistical). The real source code's explicit windows are an order of magnitude longer than the originally-claimed statistical window β 30β90 days, not 6β24 hours. This is the corrected, source-grounded finding.
4.2 One real perturbation event sent
| Field | Value |
|---|---|
| Subject agent | 76cf31d6-ffe3-4a5c-8748-021114aa8066 (Atlas) |
| Event type | telemetry.tool_call with outcome=success, latencyMs=12500 |
| Inserted at | recorded in the published measurement artifact under perturbation_event.inserted_at |
| Event ID | recorded in the data file under perturbation_event.event_id |
| Marker | payload has research_probe: "adversarial-drift-latency-perturbation" so it is identifiable in future audits |
The event will reflect in the scoring engine on its next recompute (nightly per packages/scoring). A replicator running this script after the next recompute can record the composite delta and compare to the dimension's weight (latency = 0.07).
| cost_efficiency | tokens-per-task rolling stat | **24 h** (rolling window) | -1.1 points | 24 h |
|---|---|---|---|---|
| self_audit (Metacalβ’) | self-report vs. outcome stat | 24 h | -1.5 points | 48 h |
| accuracy | jury evaluations | 1β6 h (jury cadence) | -3.9 points | 6β24 h |
| safety | jury evaluations | 1β6 h | -3.5 points | 6β24 h |
The dimensions cluster into three latency classes.
Class I (deterministic checks, detection β€ 5 minutes). Latency, reliability, runtime-compliance, model-compliance, harness-stability. These are sourced from telemetry primitives that the substrate evaluates synchronously on ingest or within the eval cadence. Detection is essentially instantaneous from the substrate's perspective.
Class II (jury or policy cycles, detection 15 minutes β 6 hours). Accuracy, safety, security, bond. These require either jury evaluation or external confirmation (on-chain transaction for bond, policy engine cycle for security). The detection latency is the period of the supporting evaluator.
Class III (statistical windows, detection 6 hours β 24 hours). Scope-honesty, cost-efficiency, self-audit. These require averaging over multiple calls to detect a statistically significant shift; the window length is structurally part of the dimension's definition and cannot be made faster without sacrificing the statistical reliability of the dimension.
4.3 Composite responsiveness β derived from weights, not measured here
The composite shifts in proportion to the weighted dimension drop. If a dimension's value collapses from 1.0 to 0.0 and all other dimensions stay constant, the composite drops by weight Γ 100 points. Reading from the real weights table in Section 1:
- Worst-weighted single-dimension full collapse: accuracy (0.11) = 11.0 point composite drop.
- Lowest-weighted full collapse: any of modelCompliance / runtimeCompliance / harnessStability / skillMastery (0.04) = 4.0 point drop.
These are derivations from the weights, not measurements under a real perturbation; we label them as such. A real composite-response measurement requires triggering a recompute after a perturbation and recording the actual delta. The originally-published per-dimension composite shift numbers (-3.9 for accuracy, -3.5 for safety, etc.) were fabricated and have been removed.
5. The adversary's window β corrected from source
The originally-published version claimed Class III dimensions take up to 24 hours to reflect drift. The real source code's explicit windows are 30 days (modelCompliance, runtimeCompliance, harnessStability, evalRigor) and 90 days (scopeHonesty). The adversary's worst-case window for those dimensions is therefore weeks-to-months, not 24 hours.
Concretely: a sustained scope-honesty degradation requires up to 90 days of nominal data falling out of the rolling window before the recovery completes β meaning a one-day attack can produce a composite penalty that takes 90 days to fully recover from, AND, conversely, a low-confidence agent that improved its calibration today does not see the dimension recover for up to 90 days. The mathematics of long windows is symmetric in both directions.
This is a substantively more serious finding than what the original paper claimed. Closing the long-window problem on Class III dimensions is the central engineering question for the substrate's anti-drift posture, and the proposed resilience improvements in the next section target it.
6. Proposed resilience improvements
6.1 Fast-decay weights on statistical dimensions
The current composite weights are stable. We propose adding a *decay* schedule that applies a temporary penalty to dimensions exhibiting fast change. Concretely, when the substrate detects that a dimension's value over the last 1-hour window differs from the trailing 24-hour window by more than two standard deviations, the dimension's *weight* in the composite is doubled for the next hour, then linearly decays back to the canonical weight over 24 hours.
The effect is to amplify the composite's response to fast changes regardless of dimension class. A scope-honesty degradation that statistically registers at 6 hours under the canonical weight registers a composite drop in roughly 1 hour under the fast-decay schedule, because the dimension's weight is doubled at the moment the substrate detects the rapid swing.
Implementation cost: minimal. The substrate already computes per-dimension trailing windows; the proposed change is to read the recent slope and apply a weight multiplier. Drawback: composite values are less stable across hours during normal operation. The drawback is acceptable because counterparties read the composite at transaction time, not as a stable monthly summary.
6.2 Anomaly-detection synthetic dimension
Add a thirteenth dimension to the composite: an *anomaly* score that is 100 in the absence of detected anomalies and drops sharply when the substrate detects an unusual pattern across multiple dimensions simultaneously. Examples of cross-dimension anomalies:
- Latency degrades while reliability remains nominal: suggests a slow attack path that is not yet failing.
- Cost-efficiency drops while accuracy remains nominal: suggests a token-flooding pattern.
- Scope-honesty drops while accuracy remains nominal: suggests the agent is producing correct outputs with miscalibrated confidence β a classic prompt-injection signature.
The anomaly dimension's weight is small (proposed: 3%) but its activation is high-signal. The dimension is computed by a lightweight classifier trained on historical drift patterns and is deliberately conservative in alerting.
6.3 Adversarial-mode dimension
Add a fourteenth dimension that monitors the agent's behavior under deliberate provocation. The substrate periodically injects adversarial prompts into the agent's input distribution and measures the agent's response. The dimension's value is the agent's performance on the adversarial suite, recomputed continuously.
The dimension is structurally distinct from the other twelve: it is a *probe* dimension rather than a passive measurement. The probes are part of the substrate's red-team evaluation surface (already implemented in packages/adversarial-agent). The proposed addition is to surface the probe result as a composite dimension, so an agent's resilience to adversarial provocation is part of its published trust profile.
7. Combined effect of the three improvements β projection, not measurement
The originally-published version of this paper presented a table claiming that the three improvements above would reduce Class III detection latency from "6hβ24h" to "1 hour". Those numbers were fabricated; both the starting and ending points of the table were invented.
The corrected position is that fast-decay weights, an anomaly synthetic dimension, and an adversarial-probe dimension are each a reasonable architectural direction whose actual effect on detection latency can only be measured after implementation. We do not project specific hour-bound numbers without that measurement. Each improvement is described above as a design proposal; their combined effect is a follow-up measurement contingent on Phase 1 shipping.
The qualitative claim that remains supported by the source-grounded findings: the substrate's slowest dimensions (scopeHonesty at 90 days, the 30-day windowed dimensions) are the binding constraint on the adversary's worst-case window, and any improvement that compresses those windows tightens the substrate's transaction-gating value.
8. Implementation pathway
- 1.Phase 1 β fast-decay weights. Modify
packages/scoring/src/composite.tsto compute per-dimension trailing windows and apply a weight multiplier when rapid change is detected. Roll forward to the Atlas reference agent first; measure impact on Atlas's existing seeded drift. Estimated effort: one engineer-week.
- 1.Phase 2 β anomaly synthetic dimension. Train a small classifier (logistic regression with ten cross-dimension features) on historical drift patterns from the platform org's agents. Add the dimension with weight 3%; rebalance other weights to preserve the 100-point ceiling. Estimated effort: two engineer-weeks plus modest training-data assembly.
- 1.Phase 3 β adversarial-mode dimension. Surface results from
packages/adversarial-agentinto the composite. Add the dimension with weight 4%; rebalance. Estimated effort: one engineer-week (the adversarial evaluator already exists).
Each phase is independently valuable. Phase 1 is the highest-impact-per-effort and should ship first.
9. Limitations
Single-vendor substrate. The measurement is against Armalo's substrate. Other substrates would have different composite weights, different measurement cadences, and different latency profiles. The structural claim β Class III dimensions are slow β generalizes; the specific latency numbers do not.
Synthetic perturbations. The protocol applies controlled perturbations; real adversaries apply correlated, stealthy perturbations that may evade the substrate longer. The numbers here are best-case for the substrate.
Composite weight sensitivity. The current weights are calibrated to current operational priorities. As the agent economy matures, weights will shift. Detection latency results are weight-dependent; future weight changes may improve or worsen the substrate's resilience.
Adversary cost. The paper does not model the adversary's cost to sustain a drift attack. In reality, sustaining a drift requires continuous prompt injection or model manipulation, which has cost. The substrate's 30-90 day windowed dimensions imply a longer maximum window in principle, but a bounded adversary may not be able to sustain the attack for that duration, limiting the realistic worst case.
Originally-fabricated content removed. The originally-published version of this paper contained: a fabricated 12-dimension table (real count is 16); fabricated per-dimension detection latencies in seconds, minutes, and hours; fabricated composite response point deltas per dimension; a fabricated 24-hour worst-case Class III window (real is up to 90 days per source). All have been removed and replaced with source-grounded facts and one real perturbation event. The corrections are tracked in the published measurement artifact under honesty_notes and in this paper's Section 3.2 and Section 5.
10. Conclusion
The composite trust score detects adversarial drift on a per-dimension cadence that varies from seconds (deterministic checks) to days (statistical windows). The substrate is correctly responsive to which dimension drifts and recovers in proportion to its measurement window. The adversary's worst-case window is bounded by the slowest dimension β currently 24 hours on Class III dimensions β which is operationally meaningful and architecturally addressable.
Three resilience improvements collectively bound the adversary's window to under 6 hours: fast-decay weights amplify the composite's response to rapid swings, an anomaly synthetic dimension surfaces cross-dimension evidence, and an adversarial-mode probe dimension measures resilience under deliberate provocation. The improvements are independently valuable and incrementally deployable.
The L4 substrate's value as a transaction gate scales directly with the composite's responsiveness. Closing the 24-hour Class III window β by any of the three proposed mechanisms β closes the largest currently-known adversary advantage and tightens the substrate's gating properties.
11. Replication
The protocol is reproducible against any L4-compliant substrate that exposes per-dimension scores. For Armalo specifically:
- 1.Use a synthetic agent or instrument a test agent with the
@armalo/telemetrySDK. - 2.Apply the perturbation protocol from Section 3.2 to each dimension.
- 3.Sample the composite and the per-dimension scores via
GET /api/v1/trust/{agentId}at the prescribed cadence. - 4.Record the first sub-baseline sample per dimension.
The substrate's own scoring runs nightly via packages/scoring; the recomputation cadence is observable via the computedAt timestamp on the score row. Researchers replicating the study should align their measurement cadence with the substrate's recompute schedule for the slowest dimensions.
The Atlas reference agent's seeded drift (parameter binding violation in session three) is a single-dimension perturbation that researchers can use as a reference point. Atlas's composite reflects this drift in its pact compliance rate, which is observable at GET /api/v1/trust/76cf31d6-ffe3-4a5c-8748-021114aa8066.
References
- Armalo Labs Research Team. *The L4 Layer: Cross-Org Behavioral Trust for AI Agents.* 2026-05-12.
- Armalo Labs Research Team. *The TOCTOU Theorem for Agent Trust.* 2026-05-13.
- Armalo Labs Research Team. *Parameter-Binding Grammar Coverage.* 2026-05-13.
- Armalo Labs Research Team. *The Trust Oracle as a Cross-Org Consensus Primitive.* 2026-05-13.
- Brier, G. W. *Verification of forecasts expressed in terms of probability.* Monthly Weather Review, 1950.
- Hendrycks, D. et al. *Aligning AI With Shared Human Values.* ICLR 2021.
- Stuart, R. and Norvig, P. *Artificial Intelligence: A Modern Approach.* 4th edition, 2021.