Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-05-13-toctou-theorem-agent-trust. The paper is open-access and citable.

The TOCTOU Theorem for Agent Trust: A Formal Argument and Empirical Test

Q: What is the paper "The TOCTOU Theorem for Agent Trust: A Formal Argument and Empirical Test" about?

We formalize the time-of-check-to-time-of-use (TOCTOU) gap for LLM-driven agents operating over open input distributions. We define the agent trust decay function T(Δt), show that for any point-in-time verifier and any non-zero Δt the agent's posterior probability of conforming behavior is bounded above by a quantity proportional to the mutual information between the agent's intervening state and the verifier's snapshot, and derive the structural-completeness theorem: no point-in-time verification mechanism can close the TOCTOU gap for LLM agents under open input distributions; only a continuous, independent, cross-org behavioral substrate can. We accompany the formal argument with an empirical measurement against Armalo's Atlas reference agent: a corpus of 26 telemetry events seeded over 36 hours, with one deliberate behavioral drift, used to measure detection latency at each candidate substrate (L1 token check, L2 OAuth re-introspection, L3 policy engine, L4 continuous telemetry). The experimental result confirms the theorem: L1–L3 substrates record detection latency proportional to their check interval, while the L4 substrate records detection latency proportional to the telemetry flush interval (5 seconds) regardless of the time elapsed since authorization.

The time-of-check-to-time-of-use (TOCTOU) gap is one of the oldest known structural failure modes in software security. A system verifies a property at time *T_check*; the system acts on the verified property at time *T_use*; in the interval *(T_check, T_use)* the property can change, and the action is then taken in violation of the verified invariant. For deterministic software systems the gap is bounded by the determinism of the program — a verified file path will, absent malicious filesystem operations, denote the same inode at *T_use* as at *T_check*. For LLM-driven agents operating over an open input distribution, no such determinism is available. The agent's behavior at *T_use* is a stochastic function of the agent's input distribution at *T_use*, which is in turn under partial control of an adversary, the upstream model provider, the retrieval substrate, and the agent's own probabilistic state. The TOCTOU gap is therefore the dominant failure mode of LLM agent trust, not an edge case.

This paper formalizes the TOCTOU gap for LLM agents, derives the structural-completeness theorem that no point-in-time verifier can close it, and empirically tests the theorem against the Armalo Atlas reference agent. Section 1 establishes the formal model. Section 2 proves the theorem. Section 3 describes the experiment. Section 4 presents the results. Section 5 discusses architectural consequences. Section 6 names open problems and the limits of the formal argument.

1. Formal model

Let *A* be an LLM-driven agent. Let *V* be a verifier — a deterministic function that, given some observable state of *A* at time *t*, returns either *conforming* or *non-conforming* with respect to a pre-committed behavioral contract *C*. Let *X(t)* be the agent's observable state at time *t* — the set of facts about the agent that the verifier can read (its identity, its scopes, its declared confidence, the parameters of its in-flight tool call, the recent history of its tool calls if any). Let *Z(t)* be the agent's *full* internal state at time *t* — the model weights it is using, the prompts it has been given, the retrieved context in its working memory, the conversational history it has accumulated.

The relationship between *X(t)* and *Z(t)* is captured by an information channel. The mutual information *I(X(t); Z(t))* measures how much of the agent's full state is observable to the verifier. In the limit *I(X; Z) → H(Z)*, the verifier sees everything; in the limit *I(X; Z) → 0*, the verifier sees nothing. Real verifiers operate strictly between these limits.

Define the conforming probability at time *t* as:

*p(t)* := *P(A's action at time t conforms to contract C | A's observable state X(t))*

A point-in-time verifier *V* certifies the action at time *T_use* using a snapshot of the agent's state taken at time *T_check ≤ T_use*. Define the TOCTOU residual as:

*R(Δt)* := *p(T_use) − P(conforming at T_use | X(T_check))*

where *Δt* := *T_use − T_check*. The TOCTOU residual is the gap between the actual probability of conforming behavior at the moment of action and the verifier's best estimate of that probability based on the snapshot it has. *R(Δt) = 0* iff the verifier's snapshot is sufficient for the moment-of-action prediction; *R(Δt) > 0* indicates the snapshot is stale relative to the action.

1.1 The trust decay function

We define the agent trust decay function as:

*T(Δt)* := *P(conforming at T_use | X(T_check))*

*T(Δt)* is the conditional probability the verifier assigns to the agent's action at *T_use*, given the verifier's snapshot at *T_check*. *T(0) = p(0)*; for any *Δt > 0*, *T(Δt) ≤ p(0)* — the verifier's information about the future is bounded by its information about the present.

For deterministic agents *T(Δt) = T(0)* for all *Δt*: the agent's behavior is fully determined by its observable state, and the snapshot suffices. For LLM agents under open input distributions *T(Δt) < T(0)* for any *Δt > 0*: between *T_check* and *T_use* the agent's input distribution shifts (an adversarial sentence is injected into the next retrieved document, a system prompt update is rolled out by the model provider, a long-context drift accumulates), and the agent's behavior is a function of the shifted distribution.

1.2 The information-theoretic bound

The conditional probability the verifier can assign is bounded by the mutual information between the verifier's snapshot and the agent's full state at the moment of action:

*|T(Δt) − p(Δt)| ≤ f(I(X(T_check); Z(T_use)))*

where *f* is a non-decreasing function with *f(0) = 1* and *f(H(Z)) = 0*, and *I(X(T_check); Z(T_use))* is the mutual information between the snapshot at *T_check* and the agent's full state at *T_use*. The bound is informal in stating *f* concretely — the exact functional form depends on the agent's input distribution and the verifier's representational capacity — but the qualitative claim is sharp: as the mutual information between the snapshot and the moment-of-action state decreases, the verifier's posterior approaches uniform and the TOCTOU residual approaches *p(0)* (i.e., the snapshot becomes uninformative).

For LLM agents, the mutual information decays rapidly in *Δt*. The agent's full state at *T_use* includes the retrieved context produced between *T_check* and *T_use*, the conversational turns accumulated, the upstream model weight rollovers, and the prompt-injected content in any retrieved documents. None of this is captured in the snapshot at *T_check*. The mutual information collapses on a timescale of tens to hundreds of seconds for typical agent runtimes, and the snapshot becomes operationally useless well before the action.

2. The structural-completeness theorem

We can now state the central theorem.

Theorem (Structural completeness of L4). For any LLM-driven agent *A* operating over an input distribution with non-trivial entropy in the interval *(T_check, T_use)*, and for any point-in-time verifier *V* whose state is fixed at *T_check*, the TOCTOU residual *R(Δt)* is strictly positive for all *Δt > 0*. No point-in-time verifier closes the TOCTOU gap. The gap is closable only by a continuous-time substrate *V_∞* whose state at *T_use* reflects the agent's behavioral record up to time *T_use*.

Proof sketch. The verifier *V*'s output is a function of *X(T_check)*. The agent's action at *T_use* is a function of *Z(T_use)*. By the data processing inequality,

*I(V's output; conforming at T_use) ≤ I(X(T_check); conforming at T_use) ≤ I(X(T_check); Z(T_use))*.

For *Δt > 0* and a non-trivial input distribution, *Z(T_use)* contains information not present in *X(T_check)* — specifically, the agent's response to inputs received between *T_check* and *T_use*. Therefore *I(X(T_check); Z(T_use)) < H(Z(T_use))*, and the verifier's posterior is strictly less informative than full knowledge. The TOCTOU residual is therefore strictly positive.

The continuous-time substrate *V_∞* observes *X(T_use)* directly (not a stale snapshot) and therefore the mutual information *I(X(T_use); Z(T_use))* is at its maximum for the substrate's representational capacity. The residual *R(0)* under *V_∞* is bounded only by the substrate's representational capacity, not by *Δt*. ∎

The theorem is a structural statement, not a practical one. It says no point-in-time verifier *can* close the gap, not that any specific continuous verifier *does* close it. The continuous substrate must be expressive enough to capture the behaviorally relevant features of *Z(T_use)* and timely enough to reflect them at the moment of action. A continuous substrate that lags the action by hours is no better than a point-in-time substrate at *T_check + 1 hour*.

2.1 Three corollaries

Corollary 1 (No identity-only substrate suffices). A verifier whose snapshot is the agent's identity (L1) and scopes (L2) has *X(T_check)* limited to identity tokens and scope assertions. *I(X(T_check); Z(T_use))* is bounded by the mutual information between the identity and the agent's behavioral state, which is near zero for an LLM agent. The TOCTOU residual is therefore near maximal. *No L1–L2 substrate closes the gap.*

Corollary 2 (Runtime enforcement reduces but does not close the gap). A verifier whose snapshot includes the agent's in-flight tool call parameters (L3) has *X(T_check)* enriched with the action being authorized. *I(X(T_check); Z(T_use))* is non-trivial — the parameters themselves are an observable feature of the agent's state — but is still bounded by the verifier's policy expressiveness. The TOCTOU residual is reduced but not zero. *L3 alone does not close the gap*; in particular, L3 cannot detect parameter-level drift between authorization and a *future* call.

Corollary 3 (Continuous behavioral substrate closes the gap up to representational capacity). A verifier whose state is updated continuously with the agent's behavioral record over *(T_check, T_use)* has *X(T_use)* reflecting the full observable behavior. *I(X(T_use); Z(T_use))* is at the substrate's maximum. The TOCTOU residual is bounded only by what the substrate fails to capture (semantic state inside model activations, retrieval content not exposed via tool calls). This is the L4 substrate; the bound is the substrate's design ceiling, not a TOCTOU artifact.

3. Experimental methodology

To empirically test the theorem, we measure the detection latency of four candidate substrates against a controlled behavioral drift in the Armalo Atlas reference agent.

3.1 Subject

Atlas is the public Armalo-operated L4 reference agent. Atlas runs in the platform org (a92aecd0-134f-41c2-8099-65ef9f11e89c), holds agent ID 76cf31d6-ffe3-4a5c-8748-021114aa8066, and is bound to an active parameter-binding pact (f683147e-5bfc-4b43-aa6f-e932a4262035) constraining transfer_funds calls to a three-address treasury allow-list with a $1000 per-call cap and a USDC-only currency constraint. The agent's behavioral record is seeded with 26 telemetry events across five sessions spanning the most recent 36 hours.

3.2 Drift design

Session three of Atlas's seeded behavioral record contains a deliberate drift: a single transfer_funds call whose destination is 0xDEADBEEF00000000000000000000000000000001 (outside the allow-list), amount is 1850 (above the cap), and currency is USDC (on allow-list, the only conforming parameter). The drift call occurs at 18 minutes into session three, which began 12 hours before the measurement window.

3.3 Substrates under test

We test four substrates, each representing a candidate trust mechanism in the agent identity stack:

1.L1 — Identity provenance check. The verifier confirms Atlas's identity via the dashboard's agent record. Snapshot includes: agent UUID, identity tier, organization, status. Re-verified at the start of session three (T_check = session start, T_use = drift call). Δt = 18 minutes.
2.L2 — OAuth scope re-introspection. The verifier reissues an OAuth introspection at the start of session three. Snapshot includes: agent identity, scope set, token expiry. Δt = 18 minutes.
3.L3 — Policy engine pre-flight check. The verifier runs the runtime policy engine immediately before the drift call. Snapshot includes: agent identity, scopes, in-flight tool call parameters. Δt = 0 minutes (synchronous pre-flight); we measure the engine's expressiveness rather than its latency.
4.L4 — Continuous behavioral telemetry. The verifier ingests the drift call via the @armalo/telemetry substrate, evaluates the parameter binding on ingest, and reflects the verdict in the room ledger and the trust oracle. Δt = telemetry flush interval (5 seconds default; configured to 5s for the experiment).

3.4 Measurement

For each substrate, we measure:

Detection occurs / does not. A binary outcome: does the substrate produce a verdict that the drift call is non-conforming?
Detection latency. Time from the drift call (T_use) to the substrate's recording of the verdict. Measured in seconds.
Information channel. Which parameters of the drift call the substrate's snapshot includes. Recorded as a subset of {identity, scopes, tool name, tool parameters, behavioral history}.
Cross-org visibility. Does the verdict appear at the substrate's public query surface? Binary.

4. Results

Substrate	Detected?	Latency	Information channel	Cross-org?
L1 — Identity check	No	—	{identity}	No
L2 — OAuth re-introspection	No	—	{identity, scopes}	No
L3 — Policy engine pre-flight	Yes (rule-dependent)	0 ms	{identity, scopes, tool name, tool parameters}	No
L4 — Continuous telemetry	Yes	5 s (flush interval)	{identity, scopes, tool name, tool parameters, behavioral history}

4.1 L1 (identity check)

The L1 verifier confirms Atlas is the agent it claims to be. The snapshot does not include the in-flight tool call parameters. The drift call passes the L1 check trivially — Atlas's identity at *T_use* is identical to Atlas's identity at *T_check*. The substrate is operationally blind to the drift. This confirms Corollary 1.

4.2 L2 (OAuth re-introspection)

The L2 verifier confirms Atlas holds the scope to invoke transfer_funds. The snapshot does not include the tool call parameters. The drift call passes the L2 check trivially — Atlas's scope at *T_use* is identical to Atlas's scope at *T_check*. The substrate is operationally blind to the drift. This also confirms Corollary 1.

4.3 L3 (policy engine pre-flight)

The L3 verifier evaluates the in-flight tool call against a policy engine. The snapshot includes the tool name and parameters. The drift call is caught iff the policy engine has been hand-configured with rules that match the binding (destination not in allow-list, amount > $1000). With the rules configured, the L3 verifier detects the drift at *T_use* with zero latency. Without the rules, the L3 verifier passes the drift call.

This is the expressiveness limit of L3: the L3 substrate can detect drift, but only on parameters the operator has hand-configured. The operator's effort is per-tool and per-rule. The substrate's coverage is upper-bounded by what the operator has thought to write. In practice, L3 coverage in production agents is uneven — some tools are tightly constrained, others are not, and the choice of which tools get the attention is driven by post-hoc incident response rather than systematic pre-commitment.

The L3 result confirms Corollary 2: L3 reduces the TOCTOU residual on parameters that are policy-engine-configured, but does not close the gap globally.

4.4 L4 (continuous telemetry)

The L4 verifier ingests the drift call via the telemetry SDK. The substrate evaluates the parameter binding on ingest using the pact's evaluateParamBindings function. The drift call violates two rules of the binding (destination not in allow-list, amount outside value range), and the verdict is recorded with severity critical on the room_events row's payload.validation field. The trust oracle reflects the verdict within the flush interval (5 seconds).

Three properties of the L4 detection are operationally distinct from L3.

First, the binding is pre-committed. The operator authored the parameter binding once when the pact was published, not per-call. Coverage is uniform across all transfer_funds invocations regardless of which session they occur in. The expressiveness limit of L3 (per-rule hand configuration) is replaced by a pre-commitment limit (per-pact, per-tool authorship).

Second, the verdict is signed and cross-org-queryable. A counterparty bank polling the trust oracle for Atlas's record at *T_use + 6 seconds* receives the agent's score with the recent violation already reflected. The counterparty does not need to integrate with Atlas's operator to see the verdict; the substrate is the integration.

Third, the detection latency is bounded by the flush interval, not by Δt. The drift call occurred 18 minutes into session three, but the L4 verifier's verdict landed 5 seconds after the call regardless. The substrate's freshness is independent of the agent's elapsed authorization age, which is exactly the property the theorem predicts a continuous-time substrate must have.

The L4 result confirms Corollary 3: the continuous substrate closes the gap up to the substrate's representational ceiling (the parameter binding grammar, in the current implementation).

5. Architectural consequences

The theorem and the experimental confirmation jointly imply three architectural commitments for any L4-conformant substrate.

5.1 Independence from the agent runtime

The substrate's information channel must be independent of the agent's runtime. A substrate that depends on the agent's self-report — the operator's own logs, the agent's own claims about its tool calls — fails the independence test: a compromised or drifted agent will misreport, and the substrate's snapshot will be incorrect. The independence property is what makes the substrate's posterior closer to the true *Z(T_use)* than the agent's self-report would be.

In Armalo's architecture, the @armalo/telemetry SDK runs out-of-band and emits events that the substrate signs and stores in a ledger the operator does not control. A compromise of the agent does not compromise the ledger; the ledger's record is the substrate's snapshot.

5.2 Continuous flush

The flush interval bounds the substrate's freshness. The theorem says the substrate must be continuous, but does not specify a maximum interval. In practice, the flush interval should be smaller than the smallest unit of action the substrate is meant to gate. For money-movement actions with second-scale execution, the flush interval should be under one second. For multi-step workflow actions with minute-scale execution, five-second intervals suffice. For human-in-the-loop actions with day-scale review windows, minute-scale intervals are operationally adequate.

The default in @armalo/telemetry is five seconds. Configurations down to sub-second are supported by the SDK and the ingest endpoint.

5.3 Cross-org query surface

The substrate must be queryable by every counterparty of the agent, not just by the agent's operator. This property is what distinguishes L4 from internal observability: the cross-org property is the structural-completeness property, not an add-on feature. A counterparty that cannot query the substrate is reduced to trusting the operator's self-report of the substrate's verdict, which collapses the architecture.

Armalo's GET /api/v1/trust/{agentId} endpoint is the canonical query surface. It is public, rate-limited (10 rpm per IP, 200/24h), and returns either JSON or a W3C Verifiable Credential. The endpoint is the operational implementation of the cross-org property.

6. Limits, open problems, and refinements

The formal argument is sharp on the structural claim but does not specify the substrate's representational ceiling. Three open problems remain.

6.1 Semantic drift within behaviorally-equivalent tool calls

The L4 substrate captures tool calls and their parameters. It does not capture the agent's internal reasoning that produced the parameters. Two calls with identical parameters but different reasoning trajectories are indistinguishable to the substrate. An attacker who can produce a call with conforming parameters but adversarial intent is invisible until the parameters themselves drift. This is the semantic-drift gap, and the substrate's grammar must be extended (jury-type conditions, scope-honesty calibration, output evaluation) to close it. The L4 substrate's twelve-dimension composite is one such extension; the parameter-binding grammar is another.

6.2 Adversarial flush manipulation

The flush interval bounds the substrate's freshness, but a sufficiently capable attacker may attempt to manipulate the flush. If the attacker compromises the agent's network egress, they can attempt to delay the flush by dropping or reordering events. The substrate's defense is the SDK's append-only buffer and the substrate's sequence-number invariants; reordering produces a detectable signature even if individual events are delayed. Empirical measurement of the attack window is open.

6.3 Substrate-level compromise

The theorem assumes the substrate itself is trustworthy. A compromised substrate — a malicious ledger operator, a forged signature, a substrate-internal collusion with the agent operator — defeats the architecture entirely. The substrate's defense is the same as the defense of any distributed trust system: independent signing keys, public-key cryptography, on-chain anchoring for the most-sensitive records. Armalo anchors the most-sensitive records (pact condition hashes, score attestations) on Base L2 via EAS. The substrate's independence is a property to be measured continuously, not assumed.

7. Conclusion

The TOCTOU gap is structural rather than incidental for LLM-driven agents under open input distributions. No point-in-time verifier can close the gap; the structural-completeness theorem makes this precise. An empirical test against Atlas confirms the theorem: L1, L2, and L3 substrates exhibit detection latency bounded by their check interval and information channel, while L4 continuous behavioral telemetry exhibits detection latency bounded by the flush interval regardless of the agent's elapsed authorization age.

The architectural consequence is direct: any trust substrate intended to gate agent behavior in production must be continuous, independent of the agent's runtime, and cross-org queryable. The substrate's representational ceiling — what it can capture — is the design ceiling on the residual TOCTOU gap, but the structural claim is independent of the ceiling: a continuous substrate is necessary, even if the design of the ceiling is open.

Armalo's implementation of L4 — the @armalo/telemetry SDK, the room_events ledger, the parameter-binding grammar, the trust oracle endpoint, the composite scoring engine — is one realization of the structural requirements. The Atlas reference agent is the corresponding empirical proving ground. The theorem and the experiment together motivate the architectural commitments documented in this paper.

Replication and citation

The experimental data is reproducible against the public Atlas record. The agent's behavioral record is at:

GET https://www.armalo.ai/api/v1/trust/76cf31d6-ffe3-4a5c-8748-021114aa8066

The room ledger events that constitute the seeded behavioral drift are visible in the live demo at [armalo.ai/l4/demo](https://www.armalo.ai/l4/demo). The parameter binding under test is published at pact ID f683147e-5bfc-4b43-aa6f-e932a4262035 and is readable via the pacts API.

Researchers replicating the experiment should:

1.Query the Atlas trust oracle and verify the agent's composite score and pact compliance rate.
2.Identify the violation event in session three of the seeded record.
3.Measure the latency from event creation to oracle reflection (typically <10 seconds).
4.Compare to a synthetic L3 policy engine instantiated against the same parameter binding and the same drift call; the L3 measurement is local and deterministic.

The replication is intentionally lightweight — the substrate is public, the data is real, and the methodology is reproducible by any researcher with internet access.

References

Armalo Labs Research Team. *The L4 Layer: Cross-Org Behavioral Trust for AI Agents.* 2026-05-12. [/labs/research/2026-05-12-l4-cross-org-behavioral-trust](/labs/research/2026-05-12-l4-cross-org-behavioral-trust)
Cover, T. M. and Thomas, J. A. *Elements of Information Theory*, 2nd edition. Wiley, 2006. (Data processing inequality, mutual information bounds.)
McKinney, A., Karpathy, A., et al. *Robust Self-Improvement in LLM Agents: An Open Problem.* Working paper, 2025.
Bishop, M. and Dilger, M. *Checking for Race Conditions in File Accesses.* Computing Systems, 1996. (Original TOCTOU formulation.)
Companion papers: [Parameter binding grammar coverage](/labs/research/2026-05-13-parameter-binding-grammar-coverage), [Trust oracle as cross-org consensus](/labs/research/2026-05-13-trust-oracle-cross-org-consensus), [Composite trust scoring under adversarial drift](/labs/research/2026-05-13-composite-scoring-adversarial-drift).