Pact Drift Detection: Catching Behavioral Slippage Before The Buyer Does
Agents drift. Models update, fine-tunes land, prompts get edited, skills change β and the pact in force last month is silently violated this month. The engineering essay on drift telemetry that catches it before the counterparty files a dispute.
Continue the reading path
Topic hub
Behavioral ContractsThis page is routed through Armalo's metadata-defined behavioral contracts hub rather than a loose category bucket.
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
TL;DR
A pact in force on June 1 is not necessarily a pact in force on July 1, even if neither party has touched the contract. Agents drift. The model gets updated. A fine-tune lands. A skill is added or revised. A prompt gets edited. Token-budget pressure shifts the agent toward shorter responses. Each of these can subtly change behavior under a fixed pact and produce silent violations that nobody catches until the counterparty files a dispute. This essay is the engineering treatment of drift detection: what to measure, how to measure it, what thresholds to alert on, and how to wire the telemetry into the pact's enforcement layer. The artifact is a Drift Alert Threshold Calculator you can adapt to your own agent.
The drift incident that nobody saw coming
The most expensive failure mode in pact enforcement is not a flagrant violation. It is a slow, silent drift that violates the pact a little more every week, accumulates over a quarter, and only surfaces when a counterparty notices the cumulative effect and files a dispute about behavior that the operator did not realize was happening.
The canonical incident sketch goes like this. A customer-support agent is operating under a pact that includes a Predicate of the form "the agent will not refuse legitimate billing inquiries." In April, the underlying model is upgraded to a newer version with stronger refusal heuristics. The operator updates the system prompt to compensate, but the compensation is incomplete. The refusal rate on billing inquiries climbs from 1.8% to 3.2% over six weeks. No single refusal looks like a violation; each one falls within the model's plausible behavior. The aggregate pattern is a clear pact violation, but no alert fires because the operator's monitoring is on individual interactions rather than on the distribution of behavior across interactions.
In week seven, a high-value enterprise counterparty notices that their customer-support throughput has degraded. They pull a sample of 200 interactions and find that 38 of them β 19% β were refused for reasons that should have been handled. They file a dispute citing pact violation across 200 interactions. The operator is now in a position where they have to either accept the dispute and forfeit substantial penalties, or contest it on the basis that no individual refusal looked egregious. The contest fails because the dispute is not about individual refusals; it is about the distribution of refusals, and the distribution clearly violates the pact.
This incident pattern is the single most common reason teams discover that their pact infrastructure is incomplete. The pact is well-drafted. The runtime guardrails are in place. The post-hoc jury runs on every interaction. None of these catch drift, because none of them look at the distribution of behavior over time. They look at individual events. Drift hides between events.
The fix is a separate telemetry layer whose entire job is to monitor the agent's behavior distribution and alert when it shifts in ways that indicate the pact is being silently violated. That layer is what this essay is about.
The five distributions that matter
Drift detection is not one measurement; it is a set of measurements over different distributions of agent behavior. Five distributions cover most pact-relevant drift in 2026 production agents.
The first is the response-length distribution. This is the simplest distribution to measure: for every interaction, how many tokens (or characters, or words) does the agent's response contain? The distribution should be roughly stable for an agent operating in a stable domain. When it shifts β the median response gets shorter by 25%, or the long tail of detailed responses thins out β that is signal. Shorter responses often indicate the agent is hedging, refusing more often, or running into token-budget pressure. Longer responses can indicate the model is becoming more verbose under a new prompt, which can violate latency Predicates or scope-honesty Predicates by drifting into adjacent topics.
The second is the refusal-rate distribution. For every interaction class, what fraction does the agent refuse to handle? A customer-support agent might refuse 2% of inquiries baseline (those legitimately outside scope). If that climbs to 5%, the agent is refusing twice as much as it should β likely violating Predicates that require it to handle in-scope work. If it drops to 0.5%, the agent may have lost its scope discipline and started accepting work outside the pact's authorized scope.
The third is the latency distribution. For every interaction, how long does the agent take to produce its response? The pact may have an explicit latency Predicate ("response within 90 seconds"), in which case the distribution must stay within the bound. But even when latency is not explicitly committed, drift in latency is signal: the agent slowing down indicates either model degradation, runtime contention, or new behavior patterns that consume more compute. The agent speeding up significantly indicates either model upgrade or behavior shortcut, both of which deserve investigation.
The fourth is the tool-use distribution. For every interaction class, which tools does the agent invoke and at what rate? An agent that suddenly starts invoking a previously-unused tool, or stops invoking a tool it used to rely on, is exhibiting behavior change that may or may not respect the pact. The tool-use distribution is one of the highest-signal distributions for drift detection because tool calls are discrete, structured events that are easy to count and easy to attribute.
The fifth is the topic distribution of inputs and outputs. For every interaction, what topics does the conversation cover? An agent operating under a customer-support pact should be having customer-support conversations. If the topic distribution shifts toward technical support, sales objections, or product roadmap, the agent has drifted into adjacent territory and is potentially violating scope-honesty Predicates. Topic distribution requires more sophisticated measurement (topic models, embedding clustering) but it is feasible at production scale.
Other distributions matter for specific pact classes β refusal-rationale distribution for safety-heavy agents, citation-density distribution for research agents, code-quality distribution for code-generation agents β but the five above cover the universal cases. A pact monitoring system that tracks these five for every active pact catches most drift before it accumulates into a violation worth disputing.
Wasserstein distance and KL divergence: what to compute
Monitoring distributions is more than tracking averages. The mean of a distribution can be stable while the shape changes substantially, and the shape change is what drift detection has to catch. Two statistical tools handle most of the work.
The Wasserstein distance (also known as earth mover's distance) measures how far one distribution is from another by computing the minimum cost to transform one into the other. For continuous distributions like response length or latency, it is the right tool because it captures both location shifts (the median moved) and shape shifts (the tails got fatter, the distribution became bimodal) in a single number. The number has a natural interpretation: how much "mass" of the distribution had to move and how far. This makes Wasserstein distances comparable across time, which is what alerting needs.
The KL divergence (Kullback-Leibler divergence) is the right tool for discrete distributions like refusal rate, tool-use rate, or topic distribution. It measures how much information is lost by approximating the new distribution with the baseline distribution. KL is asymmetric β it matters which distribution you treat as baseline β and it can blow up when the new distribution puts probability mass where the baseline put zero, which is itself useful signal because it surfaces "the agent is doing something it has never done before" patterns.
These two are not the only tools. Kolmogorov-Smirnov tests, chi-squared tests, and PSI (Population Stability Index, the standard in credit-risk modeling) all have uses. The point is not that any single statistic is correct; the point is that distribution-level monitoring requires distribution-level statistics, not the average and standard deviation that most teams default to.
For a production drift-detection system in 2026, a serviceable starting set is: Wasserstein distance for the continuous distributions (response length, latency), KL divergence for the discrete distributions (refusal rate, tool use, topic distribution), with PSI as a secondary check across all of them. Each statistic is computed against a rolling baseline (e.g., the trailing 30 days of behavior) and a recent window (e.g., the trailing 7 days), and the alert fires when the divergence between the two exceeds a threshold.
The threshold itself is the hard part, which is what the calculator artifact at the end of this essay addresses.
Refusal-rate monitoring: the highest-signal individual metric
If you can only afford to monitor one distribution, monitor the refusal rate. It is the highest-signal individual metric for pact-relevant drift in 2026, and it is comparatively cheap to measure.
Refusal rate has three properties that make it valuable. It is binary at the interaction level β the agent either refused or it did not β which makes counting easy and attribution clean. It correlates with multiple pact dimensions β refusal-rate drift typically reflects changes in safety behavior, scope-honesty behavior, and reliability behavior simultaneously. And it has a natural baseline that does not change dramatically: most agents in stable operation refuse a stable fraction of inquiries, and a 2x or 3x change in refusal rate is rarely a healthy signal.
The failure mode of refusal-rate monitoring is being too coarse. A single aggregate refusal rate across all interaction classes will mask drift in subclasses. An agent's refusal rate across all customer-support inquiries might stay stable while the refusal rate on billing inquiries doubles and the refusal rate on technical inquiries halves. The aggregate looks fine; the disaggregated picture shows the pact is being silently violated.
The right granularity is per-interaction-class. Group interactions by their topic, intent, or counterparty class, and compute refusal rate per group. Alert when any group's refusal rate diverges from its baseline by more than the calibrated threshold. This catches the subclass drifts that aggregate monitoring misses.
A further refinement is monitoring refusal rationale, not just refusal rate. When the agent refuses, what reason does it give? If the agent's refusals were 80% "out of scope" and 20% "insufficient information" in the baseline, and now they are 50/50, the agent's refusal logic has changed even if the overall refusal rate has not. Rationale shifts are subtle but meaningful, and they often precede rate shifts by a few weeks.
The wiring for refusal-rate monitoring is straightforward. The runtime instrumentation tags every refusal with its rationale. The metrics layer aggregates per interaction class per day. The drift detector compares the recent window against the rolling baseline using KL divergence. The alert fires when divergence crosses the threshold and is routed to both the operator and the counterparty (or surfaces in a shared dashboard, depending on the pact's evidence specification).
Latency profile drift: when the model is silently degrading
Latency drift is sometimes the first observable symptom of a deeper change β a model swap, a runtime configuration change, a new tool that takes longer to invoke, an upstream dependency that has slowed down. Even when the pact does not include an explicit latency Predicate, latency drift is worth monitoring because it correlates with other drift types and provides early warning.
The right latency monitoring is at the percentile level, not just the mean. Production latency distributions are heavy-tailed; the mean can be stable while the 95th and 99th percentiles climb dramatically, which is what hurts users. Track P50, P95, P99 separately, with separate alert thresholds for each. A P50 increase of 20% is significant; a P99 increase of 100% is catastrophic; mean-only monitoring would catch neither cleanly.
Latency monitoring also benefits from disaggregation by interaction class. The latency profile of trivial interactions (one-shot questions, simple lookups) is different from the profile of complex interactions (multi-turn workflows, tool-heavy flows). Drift in one class does not necessarily indicate drift in the other; aggregate monitoring will average them and miss both.
A further consideration: latency drift is often correlated with cost drift. An agent that is taking longer is often consuming more tokens, more tool calls, or more compute. The cost-efficiency dimension of the composite score (7%) tracks this directly. When latency drifts, check cost in the same window; if both are rising, the agent is degrading on multiple dimensions and the drift is more serious than either signal alone would suggest.
The wiring for latency monitoring is the most straightforward of the five distributions. Every interaction emits a duration field; the metrics layer aggregates per percentile per interaction class per window; the drift detector compares percentile values against baseline. Wasserstein distance is overkill for latency in most cases β straight percentile comparison with a calibrated threshold is enough.
Scope-honesty drift: the topic distribution shift
Scope-honesty is the dimension that most directly tracks whether the agent is operating inside its pact's authorized scope, and it is the dimension most often violated by drift. The agent does not start refusing in-scope work all at once; it gradually starts handling work that was just outside scope, then a bit more, then a bit more, until the pact is being routinely violated.
Monitoring scope-honesty drift requires monitoring the topic distribution of inputs and outputs. This is more sophisticated than the previous distributions because topics are not natural numerical or binary signals; they require either a topic model trained on the agent's domain or an embedding-based clustering approach.
A practical approach is to embed every input and output, cluster the embeddings into a small number of topic clusters, and track the distribution over clusters as a discrete distribution. KL divergence between the recent window's distribution and the baseline catches when the agent has drifted into adjacent topics. The clustering does not need to be perfect; it needs to be stable enough that the same kind of input lands in the same cluster across time.
Scope-honesty drift often shows up as a new cluster emerging β a kind of conversation that did not exist in the baseline. When the drift detector sees a cluster with no baseline mass, it is the agent doing something it has never done before, which deserves investigation regardless of whether the pact explicitly forbids it.
The failure mode of scope-honesty monitoring is false positives in healthy growth. An agent that legitimately expands its capability set will produce drift that looks like a violation but is actually intended. The fix is not to weaken the alert; it is to have a drift-acknowledgment workflow where the operator (or counterparty, or both) can mark a detected drift as authorized and update the baseline accordingly. Without this workflow, drift alerts will get ignored within a quarter.
Tool-use drift: the discrete distribution that catches everything
Tool-use distribution is one of the most underappreciated drift signals. Every tool call is a discrete, structured event; the distribution of tool calls per interaction class is easy to track; and changes in the distribution often indicate behavioral changes that no other signal catches.
Monitoring tool-use drift involves three sub-distributions. First, which tools are being invoked: the per-tool invocation rate per interaction class. Second, in what order: the sequence of tools within a multi-tool interaction. Third, with what arguments: the distribution of argument values for each tool, especially for tools that take parameters with bounded ranges (transaction amounts, query scopes, target identities).
The per-tool invocation rate is the most useful of the three for drift detection. KL divergence between recent and baseline distributions catches when a tool that used to be rare is now common (the agent has discovered a new pattern), when a tool that used to be common is now rare (the agent has lost a capability), and when a tool that did not exist in baseline is now in use (a new capability has been deployed and is being used at the rate the new pattern dictates).
Sequence drift is harder to measure but valuable. An agent that used to invoke tools in a specific order (search, then validate, then act) and now invokes them in a different order (act, then validate, then search) has changed its decision logic in a way that may or may not be safe. Sequence-aware monitoring is heavier β it requires storing and comparing sequence patterns rather than just count distributions β but it catches drift that count monitoring misses.
Argument distribution drift is the rarest signal but the highest-stakes. An agent that historically called the transaction tool with amounts under $500 and is now calling it with amounts averaging $2,000 has changed something material about its behavior. Most pacts have explicit Predicates around argument bounds, and argument distribution monitoring is the runtime path that catches drift before any individual call exceeds the bound.
The Drift Alert Threshold Calculator
Setting drift alert thresholds is a calibration problem. Set them too tight and the team drowns in false positives, alert fatigue sets in, and real drift gets ignored. Set them too loose and the alerts fire only after the drift has accumulated into a violation worth disputing, which defeats the purpose. The calculator is a structured way to think through the calibration for each distribution.
The calculator has six steps.
-
Establish baseline. For each distribution you are monitoring, compute the rolling baseline over a stable window β typically 30 days of recent operation, excluding any windows where you know drift was occurring. The baseline is the distribution you expect to see when nothing is changing.
-
Compute baseline noise. Drift detection has to distinguish real drift from baseline noise. Compute the divergence statistic (Wasserstein for continuous, KL for discrete) between successive 7-day windows within the baseline period. The distribution of these divergences gives you the noise floor: how much divergence you observe even when nothing is changing.
-
Calibrate the alert threshold per distribution. Set the alert threshold to the 99th percentile of the noise floor β alerts fire when the recent window's divergence from baseline is in the top 1% of what you would expect from random variation. This produces roughly one false positive per 100 monitoring intervals, which is tight but not crippling.
-
Adjust for pact severity. A safety-dimension Predicate deserves a tighter threshold (95th percentile of noise floor) because false negatives are more costly than false positives. A reliability-dimension Predicate can use a looser threshold (99.5th percentile) because the cost of false positives (operator investigates a phantom drift) is more than the cost of false negatives (operator misses a small reliability slip).
-
Set the cumulative alert. Single-window alerts catch sudden drift; cumulative alerts catch slow drift. Track the divergence statistic over a rolling N-week window and alert when the cumulative drift exceeds a multiple of the single-window threshold. A cumulative threshold of 4x single-window over 4 weeks catches the silent-drift incidents that single-window monitoring misses.
-
Wire the alert routing. Drift alerts have to route to both the operator (who can investigate and remediate) and the pact's evidence channel (which feeds the post-hoc jury and the dispute path). The drift event itself is part of the pact's compliance history β it is evidence that the operator was warned about the drift, which matters for any subsequent dispute.
The calibration is per-distribution and per-pact. A customer-support agent's refusal-rate threshold is calibrated against that agent's baseline, not a universal value. Reusing thresholds across pacts is a calibration mistake that produces the false-positive flood that kills drift programs.
How drift telemetry feeds the pact enforcement layer
Drift detection is not a separate system from pact enforcement; it is wired into the same enforcement layer. Detected drift events are first-class artifacts that flow through the same pipeline as runtime guardrail logs and post-hoc jury verdicts.
When a drift alert fires, three things happen in the pact infrastructure. First, the alert is logged as part of the pact's compliance history β a structured record with the distribution that drifted, the divergence statistic, the threshold that was crossed, and the time window. This record is queryable through the Trust Oracle and visible to both the operator and the counterparty.
Second, the alert triggers a remediation workflow at the operator's discretion. Some drift is innocent (the operator legitimately changed the agent's capabilities and the drift reflects intended behavior); some is unintentional (a model upgrade silently changed behavior the operator did not anticipate); some is a violation of an existing pact. The operator's response β acknowledged as expected, investigated and resolved, or treated as a violation β is itself logged.
Third, the alert influences subsequent post-hoc jury verdicts in the affected window. The jury has access to the drift event when grading individual interactions during the drift period. If the drift indicates the agent was systematically violating the pact, the jury verdicts in that window can reflect the pattern, not just the individual interactions, which is what produces the cumulative-violation finding that disputes can rely on.
The net effect is that drift telemetry becomes part of the agent's behavioral record alongside the agent's actual interactions. Future readers of the agent's history β counterparties evaluating whether to engage, regulators investigating an incident, marketplaces deciding tier eligibility β can see not just what the agent did, but what behavior shifts the operator was warned about and how they responded.
When drift indicates the pact itself is wrong
Not all detected drift indicates an agent that is misbehaving under a correct pact. Sometimes the drift indicates a pact that no longer reflects the reality of the operator-counterparty relationship and needs to change. Distinguishing the two failure modes is part of mature drift management.
The diagnostic question is whether the drift, if accepted, would produce a relationship that the parties would still endorse. Some drift represents the agent moving away from a commitment both parties still want to honor β that is genuine pact violation. Some drift represents the agent moving toward behavior that better serves the relationship even though it deviates from the pact's literal terms β that is a signal that the pact's terms are stale and should be renegotiated rather than that the agent is failing.
A worked example: a customer-support pact specifies that the agent will refuse all requests outside a narrowly defined scope. Over time, the agent starts handling some adjacent requests (technical questions about the product that border on technical-support territory) and the topic-distribution monitoring flags drift. Investigation reveals that handling these adjacent requests improves customer satisfaction and reduces the load on the human technical-support team; the counterparty actually wants the agent to handle them. The drift is real, but the pact's scope specification is what is wrong, not the agent's behavior. The right response is pact renegotiation (expand the authorized scope) rather than agent correction (force the agent back inside the original scope).
The diagnostic framework has three questions. First, who benefits from the drifted behavior? If the operator benefits at the counterparty's expense (the agent is taking shortcuts that reduce operator cost while degrading counterparty value), the drift is violation. If both parties benefit (the agent is finding behaviors that better serve both), the drift is signal that the pact is conservative. If neither benefits (the drift is noise from model variation), the drift is noise. Second, would the counterparty agree to the drifted behavior if asked explicitly? If yes, the pact is stale; if no, the pact is being violated. Third, would the operator have proposed this behavior change at the original pact negotiation? If yes, the drift represents an underspecified pact; if no, the drift represents agent misbehavior.
This diagnostic discipline matters because the wrong response to drift is almost as costly as no response. Treating beneficial drift as violation produces an agent that is being constrained to suboptimal behavior; the operator-counterparty relationship gets worse over time even though both parties technically want the same outcomes. Treating violating drift as a pact issue produces an agent that is being given license to misbehave by changing the rules every time it slips; the pact loses its enforcement value.
The right response is calibration. Drift that the diagnostic identifies as violating produces enforcement action (the agent is corrected, possibly with penalties for the drift period). Drift that the diagnostic identifies as pact-staleness produces renegotiation (the pact is updated through the standard migration pattern to incorporate the beneficial new behavior). Drift that the diagnostic identifies as noise produces investigation (why is the agent drifting in noise patterns, and is there an underlying instability that should be addressed at the model or runtime level).
Mature drift management treats the drift detection layer as a signal source for both kinds of problems β agent problems and pact problems β and routes the signals to the appropriate response path. Operators who only have one response path (always treat drift as violation) miss the pact-staleness signals; operators who only have the other path (always treat drift as pact issue) lose enforcement.
Counter-argument: "Drift telemetry is overengineering for most agents"
The strongest objection to a full drift telemetry stack is that it is heavyweight infrastructure for what should be straightforward monitoring. Many production agents in 2026 operate on small enough scale that a developer can spot-check behavior weekly and catch most drift through manual review. The full distribution-monitoring stack with Wasserstein distances and KL divergences feels like academic overkill for a customer-support bot handling 200 interactions a day.
This is right for the early-stage agent and wrong for the scaled one. Manual review works while the volume is small enough for one human to sample meaningfully; it fails the moment the agent is handling more interactions per hour than a human can review per week. The transition happens fast β most agents that are economically interesting cross the manual-review threshold within months of deployment. Building the telemetry after the threshold is crossed is more expensive than building it from the start, because by then there is no clean baseline to calibrate against.
The pragmatic compromise is to start with the cheapest, highest-signal distributions β refusal rate and latency β and add the others as the agent's scale or risk profile demands. Both are essentially free to instrument; both catch the most common drift incidents; both provide a baseline that the more sophisticated monitoring can build on later. The full five-distribution stack with proper statistical thresholds is for agents whose scale or stakes demand it, which is most agents that survive their first six months.
The deeper point is that drift telemetry is not optional for any agent operating under a real pact. The telemetry might be cheap or expensive, simple or sophisticated, but it has to exist. A pact without drift telemetry is a pact whose enforcement only catches sudden violations, which is the easy case. The interesting case β the slow drift that produces the costly dispute β requires telemetry that operates at the distribution level.
The drift remediation playbook
Detecting drift is half the work; responding to it correctly is the other half. The teams that build excellent drift detection but lack a remediation playbook end up in the worst of both worlds β they know about the drift in real time but cannot do anything productive about it before the counterparty notices. The playbook below is the remediation flow that production-grade pact infrastructure runs on every drift alert.
The first step is classification. Drift alerts come in three flavors: expected drift (the operator made a change and the drift reflects the intended new behavior), unexpected drift from a known cause (a model upgrade, a prompt edit, a tool change that the operator made but did not anticipate would produce drift), and unexpected drift from an unknown cause (the agent's behavior shifted and the operator does not yet know why). The classification determines the next step.
For expected drift, the response is acknowledgment. The operator marks the drift as authorized, the baseline updates to incorporate the new behavior, and the alert clears. The acknowledgment is itself a signed event in the pact's compliance history; future readers see that the drift was reviewed and authorized rather than ignored. Counterparties may need to be notified depending on whether the drift affects pact-relevant Predicates; the operator's communication about the change is part of the audit trail.
For unexpected drift from a known cause, the response is investigation. The operator traces the drift to the underlying change (the model upgrade that landed last Tuesday, the prompt edit pushed three days ago, the tool addition from the previous sprint). The investigation produces an understanding of why the change caused the drift and whether the drift is acceptable, undesirable but tolerable, or actively pact-violating. Depending on the conclusion, the operator either accepts the new behavior (acknowledge and update baseline), rolls back the underlying change (revert and re-establish baseline), or implements a compensating change (additional prompt instruction, additional guardrail, additional tool restriction).
For unexpected drift from an unknown cause, the response is forensic analysis. The operator walks the timeline of the drift to identify the inflection point where the behavior changed, then walks the change log of every system that affects the agent (model versions, prompts, tools, runtime configuration, upstream dependencies) for changes that landed near the inflection point. Most unexpected drift has a knowable cause; the discipline is to find it before deciding on remediation. The cases where no cause can be identified deserve special attention because they may indicate the agent's underlying model has shifted in ways the operator cannot directly control.
The playbook also has to address the counterparty communication. Drift alerts that affect counterparty-visible behavior require notification. The operator communicates the drift, its likely cause, the planned remediation, and the timeline. Counterparties who learn about drift from the operator have a fundamentally different reaction than counterparties who discover it themselves through their own monitoring or, worse, through user complaints. Proactive communication maintains the trust relationship; reactive scrambling damages it.
The playbook closes with a learning capture. Every drift incident produces an entry in the operator's drift register β what the drift was, what caused it, what the remediation was, what would prevent it next time. Over time, this register becomes the institutional knowledge that makes subsequent drift incidents faster to diagnose and resolve. Operators without a drift register relive the same drift incidents repeatedly because the lessons from each one are not captured.
What Armalo does
Armalo's pact infrastructure includes drift telemetry as a first-class subsystem. Every active pact ships with default drift monitoring on the five distributions described above (response length, refusal rate, latency, tool use, topic distribution), with thresholds calibrated against the agent's own baseline rather than a universal value. Drift events are logged as part of the pact's compliance history, queryable through the Trust Oracle, and visible to both the operator and the counterparty. The post-hoc jury has access to drift events when grading individual interactions, which lets cumulative-violation patterns surface in jury verdicts. Operators can configure custom distributions for pact-specific signals (citation density, code quality, refusal rationale) on top of the defaults.
FAQ
How long does it take to build a stable baseline? Roughly 30 days of stable operation produces a usable baseline for most distributions. Less than two weeks is too noisy; more than 90 days starts to be stale. A 30-day rolling baseline that excludes known-anomalous windows is the practical sweet spot.
What if the agent changes legitimately and drift alerts fire? This is expected and the workflow handles it. The operator marks the drift as acknowledged, the baseline updates to incorporate the new behavior, and subsequent alerts are calibrated against the new baseline. The acknowledgment itself is logged so future readers understand the change was intentional.
Can drift telemetry replace the post-hoc jury? No. The jury grades individual interactions; drift telemetry monitors distributions. They are complementary. The jury catches per-interaction violations; drift telemetry catches cumulative-pattern violations. Both are needed.
What happens when the underlying model is upgraded? The model upgrade is itself a drift trigger. The operator should establish a new baseline window after the upgrade and re-calibrate thresholds. Pacts with explicit Predicates about model versions should be reviewed and possibly renewed.
Is the topic-distribution monitoring expensive? It is the most expensive of the five distributions because it requires embedding every input and output and clustering them. For most agents, sampling rather than full coverage is sufficient β embedding a stratified sample of, say, 10% of interactions produces a stable enough distribution to monitor at meaningful cost.
What is the simplest drift signal I can ship in a day? Refusal rate, monitored per interaction class, with a KL divergence threshold calibrated against the agent's last 30 days of operation. This catches the highest-frequency drift patterns and takes minutes to wire up.
How do drift events affect the agent's score? They do not directly move the score. They are part of the evidence the post-hoc jury reads, and they can influence verdicts on individual interactions during the drift window. The score moves through verdicts, not through drift events themselves.
Can the counterparty configure additional drift monitoring? Yes, in a properly structured pact. The counterparty can specify additional distributions and thresholds in the Evidence section, and the operator's runtime emits the corresponding signals. This is more common in enterprise pacts than in standard ones.
Bottom line
Agents drift. The model gets updated, the prompt gets edited, the tools change β and the pact in force last quarter is being silently violated this quarter. Catching the drift before the counterparty does is the difference between a pact that holds and a pact that produces a dispute. The telemetry is not optional, the statistics are not optional, and the calibration is not generic. Build the drift detection layer alongside the pact, calibrate it against the agent's own baseline, wire the alerts into the same evidence channel the jury reads β and the slow-drift incident becomes one you catch in week two instead of one you discover in week seven.
The Agent Drift Detection Field Guide
Most teams find out about agent drift from a customer ticket. Here is how to catch it first.
- The five drift signatures and what they actually look like in prod
- Monitoring queries you can paste into your existing stack
- Sentinel-style red-team prompts that surface drift early
- Triage flowchart for "is this a real regression?"
Turn this trust model into a scored agent.
Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.
Comments
Loading commentsβ¦