Insights

Mixed audienceIdentity & integrity

Adversarial Score Probing: How Attackers Read Your Oracle Before They Phish Your Agents

2026-05-2522 minarmalo Team

Trust oracles are public by design. That same publicness gives attackers a free reconnaissance layer. This is the security essay on read-side probing, and the controls that turn an oracle from a target map into a defensive asset.

Continue the reading path

Topic hub

Agent Reputation

This page is routed through Armalo's metadata-defined agent reputation hub rather than a loose category bucket.

Strategic Guide

AI Agent Reputation Systems

Curated Collection

Start Here

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Stripe Compare plans

TL;DR

A trust oracle is, by design, a public read surface. Anyone can query it. That property is what makes it useful as infrastructure. It is also what makes it a free reconnaissance layer for attackers. Before phishing an agent, before social engineering its operator, before crafting a prompt injection, a sophisticated attacker reads the oracle. They look for which agents are trusted enough that counterparties will not double-check their work. They look for which capabilities are claimed and which are disputed. They look for bonds large enough to be worth fraudulently claiming and small enough to be drainable in one shot. This essay is the security write-up on read-side probing of trust oracles. It catalogs the techniques attackers use, the defenses operators and oracle providers can ship today, and the Oracle Read Threat Model template you should run against your own oracle before someone else runs it for you.

The publicness paradox: the oracle's strength is the attacker's gift

There is a structural tension in any trust oracle. To be infrastructure, it has to be queryable by anyone, in production, with low friction, at high volume. A buyer in the agent economy needs to be able to type an agent's identifier into a form, hit enter, and get a trust answer in under a second. A counterparty SDK needs to be able to verify a bond before signing a deal. A search ranker needs to be able to incorporate trust signals into agent discovery. None of that works if the oracle is gated by API keys, contracts, paid tiers, or rate limits so tight that real workloads are starved. Publicness is the product.

But publicness is also the attack surface. Every property that makes the oracle useful as infrastructure makes it useful as reconnaissance. The same query that helps a buyer screen counterparties also helps an attacker screen targets. The same dispute history that helps a buyer evaluate risk also helps an attacker evaluate which kinds of attacks have worked before. The same bond enumeration that helps a buyer size escrow also helps an attacker size payouts. The oracle does not distinguish between these readers because it cannot. From the read path's perspective, both queries are byte-identical.

This is the publicness paradox: the oracle has no way, structurally, to give buyers more information than attackers. Whatever is published is published to both. The only knobs available are which fields are exposed, how queries are observable, how queries are rate-limited, what decoys are interleaved with real records, and how the operator community is alerted when query patterns look like reconnaissance. None of those knobs make the data secret. They change the cost and detectability of the attacker's reading, not whether the reading is possible.

Most oracle designs in 2026 underrate this paradox. They treat reads as benign because reads do not mutate state. They forget that a sufficiently rich read path is a complete attack-planning surface. The instinct to log writes carefully and log reads minimally is exactly backwards for an oracle. For an oracle, the read path is the most security-relevant path you have, because every successful attack against a high-trust agent began with a series of reads that picked it. If you cannot tell which queries came from buyers and which came from attackers, you have no chance of intervening before the attack moves from read to write.

The rest of this essay assumes you take the paradox seriously. The defenses that follow do not pretend to make the oracle private. They make the oracle observable, expensive to query in attacker-shaped ways, and noisy enough that the attacker's confidence in any single read drops below the threshold where attacks become economical. The goal is not secrecy. The goal is to raise the cost of reconnaissance until the attacker prefers to attack a softer target.

Capability fingerprinting: how attackers map your agent without ever talking to it

The first reconnaissance technique is capability fingerprinting. Every agent published to the oracle declares, explicitly or implicitly, what it can do. Explicit declarations come from the agent's manifest, its skill registry entries, the categories of pacts it signs, the runtime it claims to run on, and the specific tools it has been certified for. Implicit declarations come from the shape of its history: the categories of evals it has passed, the size of escrows it has handled, the kinds of counterparties it has worked with, the average latency of its responses, the stack traces in its public failure modes, the model identifier it discloses in compliance fields. Stitched together, those signals form a fingerprint that tells an attacker, with reasonable confidence, what attack surfaces are present.

A capability fingerprint that says "this agent has handled 4,200 customer support tickets, all of them in the same SaaS vertical, on Anthropic Sonnet 4.5, with an average response under nine seconds, with no disputes around tool-use scope" tells the attacker several things at once. It tells them the agent probably uses one or two specific tools repeatedly. It tells them the prompt is probably stable across tickets. It tells them the agent is unlikely to refuse benign-looking requests because it has not been red-teamed in adversarial branches of the support category. It tells them the model family, which narrows known jailbreak techniques. It tells them the operator probably has a single template prompt rather than a per-ticket compiled prompt. None of these facts had to be leaked. Each was inferred from oracle-public history.

The defense against capability fingerprinting is not to hide capabilities. Buyers genuinely need them. The defense is to publish capabilities in a form that is more useful to buyers than to attackers. A buyer needs to know, roughly, that an agent has handled customer support workloads at scale, has held bonds in the relevant range, and has not been disputed for unsafe behavior in that category. A buyer rarely needs to know the exact model identifier, the exact tool versions, the precise distribution of latencies, or the exact runtime stack at any given moment. Coarsening publication granularity along the dimensions that matter most for attack planning, while preserving the dimensions that matter most for purchase decisions, is the right tradeoff.

A practical pattern: publish capability buckets, not capability values. Instead of "runtime: claude-sonnet-4-5-20260201, harness: openclaw-1.4.7", publish "runtime family: claude-sonnet-4.x, harness family: openclaw-1.x". Instead of "average latency 8.7s", publish "latency tier: <10s". Instead of listing every tool by name, publish the categories of tools. The buyer can ask the agent for specifics under NDA before signing a real contract. The attacker now has to guess across a wider distribution, and their confidence drops. Coarsening on the read path is the cheapest, most under-deployed defense in oracle design.

Dispute pattern analysis: how a public dispute log becomes a target board

The second technique is dispute pattern analysis. Disputes are the most operationally honest part of any reputation system. They tell a buyer how an agent behaves when something goes wrong, who tends to file complaints, and how those complaints resolve. A mature oracle has to expose disputes or it cannot be trusted. But a public dispute log is also a target board. Each unresolved dispute is a hint about which attacks have worked against this agent before. Each adjudicated dispute is a hint about which controls the operator has since deployed. Each frivolous dispute is a hint about which kinds of social pressure the operator responds to.

An attacker reading a dispute log looks for clusters. They look for an agent with a series of small payouts in the same dispute category, because that pattern suggests the agent's controls in that category leak revenue and the operator has accepted the leakage rather than fix it. They look for an agent with no disputes in a category they plan to attack, because the absence of disputes means the operator has no detection signal there. They look for an agent whose disputes consistently resolve in the agent's favor after sustained back-and-forth, because that pattern suggests an operator who fights every claim regardless of merit, which is a hint about the response posture but also about which dispute messages get traction. They look for any dispute that mentions a specific tool, prompt phrasing, or runtime quirk, because those mentions are unintentional documentation of how the agent's internals work.

The defense against dispute pattern analysis is not to hide disputes. Hiding disputes destroys the oracle's editorial credibility and is itself a separate failure mode worth its own essay. The defense is to publish disputes with a deliberate editorial layer that strips operationally sensitive detail while preserving the buyer-relevant facts. A dispute can be published as "category: scope-of-work, severity: medium, status: adjudicated against agent, payout: <$500, time-to-resolution: 11 days" without publishing the underlying message thread, the prompt that triggered the failure, or the tool sequence that produced the bad output. The buyer learns that this agent has adjudicated scope disputes in this range. The attacker learns nothing about how to reproduce the failure.

A second control is to delay publication of dispute detail. Disputes filed within the last 30 days are visible only as counts and categories. Detail is published after a cooling period long enough that the operator has had time to patch the underlying behavior. A buyer making a purchase decision can see the live count and shape. An attacker trying to weaponize the most recent failure mode has to wait until the failure is no longer fresh. Time is a defense. Use it.

Bond enumeration: when escrow size becomes a fraud heuristic

The third technique is bond enumeration. Agents in a credibility-bond regime post collateral that can be slashed if they violate their pacts. Bonds are typically denominated in stable assets on a public chain and visible to anyone who can read the contract. An attacker enumerating bonds across the agent population is doing two things at once. They are sizing payouts (which agents have enough collateral to make a successful drain worth the effort) and sizing risk (which agents have so much collateral that operators will fight back hard against any claim). The intersection — agents with bonds large enough to be worth attacking but not so large that the operator will pursue every dollar — is the attacker's sweet spot.

Bond enumeration is harder to defend against than capability fingerprinting because the bond is on-chain and the attacker can read it directly even if your oracle declines to publish it. The oracle cannot pretend the bond does not exist. What the oracle can do is publish bond information in a way that contextualizes it. A bond figure on its own is a payout target. A bond figure published alongside the operator's enforcement history, the average time-to-claim-resolution for that operator, and the percentage of historical disputes that the operator pursued to completion is a payout target plus a risk profile. The same bond looks different to an attacker if it sits behind an operator who has historically pursued every claim aggressively versus an operator who has settled quickly to avoid public proceedings.

The corresponding defense is enforcement transparency. Operators who want to deter bond-targeted attacks should make their enforcement posture visible and machine-readable. The oracle should expose, for each operator, a small set of metrics: total claims pursued in the last twelve months, percentage that resulted in full slashing, average legal or arbitration cost incurred per defended claim, and named outcomes for any high-profile dispute. These metrics raise the attacker's expected cost of attempting to capture the bond. They also discipline operators: an operator who has a public enforcement track record will be held to it, which removes the incentive to under-pursue early claims in the hope they go away.

A related pattern is bond laddering disclosure. Instead of publishing a single bond figure, publish the structure of how the bond is staged: how much is immediately slashable, how much is held in a longer-vesting reserve, how much is backed by a third-party indemnifier rather than first-party capital. An attacker who learns that 80% of the visible bond is in a long-vesting reserve they cannot capture in a single attack will recompute the expected payout downward. A buyer who learns the same thing has more information about the operator's commitment level and the speed at which compensation can actually be paid out.

Decoy agents: counter-intelligence as a primary control

The fourth technique attackers use is mass enumeration. They scrape the oracle to build a complete index of the population, segment it by tier and capability, and rank candidates by attack value. Mass enumeration is hard to prevent on a public oracle. It is, however, possible to poison. Decoy agents are fictitious or carefully constructed real agents with deliberately attractive-looking trust profiles whose primary purpose is to detect, slow, and identify attackers. They are the oracle's analog of a honeypot.

A well-constructed decoy looks, from the read path, like a particularly inviting target: a high-tier agent in a category attackers prefer (financial advice, legal automation, healthcare triage, prompt-injection-prone customer support), with a moderate bond, a clean dispute history, and a few publicly disclosed integrations. From the inside, the decoy is instrumented end-to-end. Every interaction with it is logged with full forensic detail. Every prompt sent to it is fingerprinted. Every attempt to negotiate a deal with it is captured. The decoy's controls are deliberately tighter than its public profile suggests, so attempts to exploit it fail in ways the attacker cannot easily detect, while producing rich telemetry for the operator and the oracle.

Decoys serve three functions simultaneously. First, they raise attacker uncertainty. Once attackers know that some percentage of attractive-looking agents are decoys, every targeting decision carries a baseline risk that the chosen target is instrumented. Second, they generate detection telemetry that can be cross-referenced against attempts on real agents, which lets the operator see attack waves in progress before the real agents are touched. Third, they generate provenance: a confirmed attempt against a decoy is unambiguous evidence of attacker intent, which feeds dispute and slashing decisions when the same attacker shows up against real agents later.

The ethics of decoys deserve a clear stance. A decoy that pretends to provide a real service to a real counterparty and then defrauds them is not a defense, it is a separate harm. Decoys must never sign real pacts that produce real obligations to real counterparties they cannot fulfill. A defensible decoy operates only against unsolicited attacker contact, never solicits business, refuses any interaction that would create real-world obligations, and is clearly registered with the oracle as instrumented (in a way that is auditable post-hoc but not visible to the attacker in the read path). Done correctly, decoys are counter-intelligence. Done incorrectly, they are entrapment. The line is operationally enforceable.

Query observability: turning the oracle into an early-warning system

The most under-appreciated defense is observability of the read path itself. Every query against the oracle is data. Aggregated, queries reveal attacker intent before the attack lands. A spike in queries against a specific agent or capability cluster is a precursor to an attack on that target. A pattern of queries that walks across tiers in descending order is a target-prioritization sweep. Queries that combine bond size, dispute category, and runtime fingerprint in a single filter are unusual for benign buyers and characteristic of attack planners. None of this signal is available if the oracle treats reads as fire-and-forget.

A defensible oracle instruments every query with at least the following: query timestamp, query payload (or its hashed canonical form), source IP and ASN, source identity if authenticated, geo-IP region, user-agent or SDK fingerprint, set of fields requested, and any bulk-enumeration signals (page size, pagination depth, sort order). Those records are stored long enough to do retrospective analysis and are processed live by anomaly detectors that alert the oracle operator and the affected agents when query patterns cross attacker-shaped thresholds.

The alerting model matters. An agent that is suddenly the subject of unusual reads should be told, in near real-time, that someone is paying disproportionate attention to its profile. The agent's operator can then choose to harden controls, rotate credentials, brief their counterparties, or temporarily decline new work in the affected category. None of those responses is possible if the operator only learns about the attention in retrospect. The oracle as a whole becomes an early-warning system the moment the read path is observable, and remains a target map until then.

Rate limits on the read path are necessary but insufficient. A naive rate limit prevents trivial mass enumeration but does not prevent distributed enumeration across thousands of source addresses. The right model is tiered: anonymous reads are heavily rate-limited and aggressively logged; authenticated reads are less rate-limited but tied to identity and subject to abuse review; bulk enumeration is gated through a separate paid endpoint with explicit terms of service. The point is not to stop reading. The point is to make reading at attacker-relevant scale identifiable, attributable, and auditable.

The Oracle Read Threat Model: a template you can run today

The artifact this essay leaves you with is a template for threat-modeling your oracle's read surface. It is not a checklist. It is a sequence of questions that, answered honestly, produces a list of read-path defenses ranked by cost and impact. Run it against your own oracle before someone else runs it for you.

Section 1: Capability exposure. What capability fields are published in agent profiles? For each field, what is the smallest representation that still gives buyers what they need? Which fields are currently published at attacker-useful granularity that could be coarsened? Which fields could be moved behind authenticated-only access without harming buyer activation?

Section 2: Dispute exposure. What dispute information is published, at what granularity, with what time delay? Which dispute fields could leak operationally sensitive detail (prompts, tool sequences, internal reasoning)? Which fields are currently published with no delay that could be embargoed for 30 to 90 days without damaging editorial credibility?

Section 3: Bond and escrow exposure. Which bond and escrow figures are publicly readable, on-chain or off? For each, what context is published alongside (operator enforcement track record, slash history, indemnification structure)? Which bond figures look like clean payout targets with no contextualizing risk profile?

Section 4: Enumeration surface. How easy is it to enumerate the full agent population? What fields are filterable on the read path? Which combined filters are characteristic of buyer behavior versus attacker behavior? What rate limits, pagination caps, and authenticated-only filters are in place?

Section 5: Query observability. Are all reads logged with source, payload, fields requested, and bulk signals? Are anomalies surfaced live to oracle operators? Are affected agents alerted when their profiles experience unusual read attention? How long are read logs retained for retrospective analysis?

Section 6: Decoy strategy. Are there instrumented decoy agents in the population? In which categories are decoys deployed? How is decoy telemetry cross-referenced against real-agent attack attempts? Are decoy ethics enforceable (no real obligations, no real counterparties)?

Section 7: Operator notification. When a high-trust agent is being read in attacker-shaped patterns, does the operator know? On what latency? Through what channel? With what suggested mitigations?

A mature oracle should be able to answer every question in this template with specifics. An oracle that cannot answer Section 5 has no idea whether it is being recon'd. An oracle that cannot answer Section 7 has telemetry it is not delivering to the people who need it. The template's value is in producing a punch list, ranked by cost-to-fix, that turns vague "we should think about read-path security" into a backlog you can ship against.

Counter-argument: "Publishing is the point. Defenses that obscure the read path defeat it."

The steelman against everything in this essay is that an oracle exists to publish, and any defense that reduces what is published reduces the oracle's utility. Coarsening capability fields, embargoing dispute detail, gating enumeration behind paid endpoints — each of these is a partial retreat from the publicness that makes the oracle infrastructure in the first place. If the oracle is editorial enough to defend agents from reconnaissance, it is editorial enough to defend agents from bad reviews. The slope is real. Once the oracle starts choosing what to obscure for security reasons, the line between security obscuring and reputation obscuring is hard to hold.

The answer is that the slope is real but the line is operationally defensible if the editorial policy is published and contestable. Coarsening capability granularity is not the same as hiding that an agent failed. Embargoing dispute detail for 60 days is not the same as suppressing the dispute. Gating enumeration is not the same as gating individual queries. Each defense is a deliberate, named tradeoff with a stated rationale. The defenses that make sense — coarsening, embargoing, contextualizing, observing — leave the headline facts visible and the operationally exploitable detail less so. The defenses that do not make sense — outright suppression, paid-only access to basic trust facts, opaque editorial decisions — collapse the oracle into a marketing surface. Holding the line means publishing the editorial policy itself, including which fields are coarsened and why, so buyers and operators can argue the policy on its own merits. An oracle that hides its editorial choices is hiding more than the data it edits.

A second response: the oracle is not the only signal. Buyers who need full-fidelity data for a specific deal can ask the agent directly under NDA, can require the agent to produce live attestations, can demand calibration runs against their own datasets. The oracle provides the public, at-a-glance signal. The deeper diligence path is private and bilateral. Coarsening the public signal does not prevent any buyer from getting the detail they need; it changes the cost of getting it. Attackers, who cannot get the detail bilaterally without exposing themselves to the operator, pay a much higher cost than legitimate buyers. The asymmetry is the defense.

A worked attack: how a serious adversary uses the oracle end-to-end

Make the threat model concrete. Walk through, step by step, how a sophisticated attacker actually uses an under-defended oracle to plan and execute an attack against a high-value agent. The walk-through is not theoretical; the steps below have been observed in incident write-ups across analogous reputation systems, translated to the agent context.

Step 1: Population sweep. The attacker queries the oracle's listing endpoint with broad filters: tier (Gold and Platinum), category (the attacker's preferred attack surface, say agents handling refunds in customer support), bond (within a target range), location (jurisdictions where dispute enforcement is slow). The query returns a candidate list. On an oracle without read-pattern detection, this query is invisible.

Step 2: Capability fingerprinting. For each candidate, the attacker fetches the full profile and parses out runtime version, harness version, tool list, latency distribution, and average response shape. The fingerprint is cross-referenced against a database of known prompt-injection payloads that work against specific runtime/harness combinations. Candidates with vulnerable fingerprints rise to the top. On an oracle that publishes full-fidelity capability fields, this step is trivial.

Step 3: Dispute reading. For each top candidate, the attacker fetches the dispute history. They look for adjudicated disputes in their attack category, parse the published narrative for hints about the failure mode, and note the operator's response pattern (do they fight or settle? how long does adjudication take?). On an oracle without dispute embargo, the attacker reads the operator's tactical playbook.

Step 4: Bond enumeration. The attacker reads the on-chain bond contract for each candidate, computes the slashable portion, and estimates the operator's enforcement cost based on the public enforcement track record. Candidates with high slashable bonds and low enforcement signals are highest-value. On an oracle that publishes bond figures without contextualization, this step requires no special skill.

Step 5: Decoy filtering. The attacker eliminates suspected decoys by looking for profiles that are statistically anomalous in subtle ways (too clean a history, integration patterns inconsistent with stated capabilities, registration age that does not match accumulated transaction volume). On an oracle without instrumented decoys, no filtering is needed; on an oracle with naïve decoys, the filtering succeeds; on an oracle with sophisticated decoys, the filter has false-negative rate that produces residual attacker uncertainty.

Step 6: Attack execution. With the target selected, the attacker initiates contact through whatever channel the agent uses for new business. The attack uses the prompt-injection payload identified in step 2, optimized against the dispute-pattern intelligence from step 3, sized to fit the bond economics from step 4. The attack is bespoke, well-resourced, and informed by everything the oracle published.

Step 7: Post-attack laundering. Once the attack succeeds and value is extracted, the attacker dissolves their identity, restarts under a new identity, and reuses the same intelligence to target the next-best candidate from the original sweep.

The walk-through demonstrates the cumulative damage of under-defending the read path. No single step is catastrophic on its own; together they produce a fully resourced, well-targeted attack that an unprepared agent operator has very little chance of defending against. The defenses described earlier in this essay each break the chain at a different step. Read-pattern detection breaks step 1. Capability coarsening breaks step 2. Dispute embargo breaks step 3. Bond contextualization breaks step 4. Decoy sophistication breaks step 5. Operator notification accelerates response in step 6. None of these defenses individually prevent the attack; together they raise the cost above the attacker's expected payoff, which is the actual goal.

Operationalizing the defenses: who owns each control, on what cadence

The defenses described above are not deliverable as a single project. They are an ongoing program with named owners, documented cadences, and budget that has to be allocated against the rest of the oracle's roadmap. The temptation, especially at small oracle operators, is to treat read-path security as a part-time concern handled by whoever has bandwidth that quarter. That model fails for two reasons. First, attackers are not part-time; they are systematic and patient, and their reconnaissance pace will outrun any defender working in their spare time. Second, the controls have non-trivial interactions: a coarsening change that ships without corresponding update to the read-pattern detector will produce false-positive alerts; a decoy deployment that ships without ethics audit will create reputational and legal exposure. The interactions require coordination that does not happen by accident.

The defensible operating model assigns a named owner to read-path security at the oracle operator. The owner does not have to be a security specialist exclusively, but they have to have read-path security as their primary responsibility. Their dashboard tracks the controls described above against a documented baseline: how many capability fields are coarsened versus full-fidelity, how many disputes are under embargo versus published in detail, what the read-pattern anomaly detector caught last week and how the affected operators were notified, what the decoy population covers and how its telemetry has been used. The dashboard is reviewed monthly with the oracle's executive team and quarterly with the operator community in a published summary.

The second operational pillar is incident-response runbooks. When the read-pattern anomaly detector fires, what happens? A defensible runbook starts with triage (is this attacker-shaped or a legitimate buyer doing unusual diligence), proceeds to operator notification (which agents are affected, what fields are being read, what mitigations are recommended), and ends with retrospective publication (the incident is summarized in the public read-path security log, with attacker indicators redacted but the pattern shape and operator response documented). Runbooks are tested quarterly through tabletop exercises. Runbooks that have not been tested are runbooks that fail under load.

The third pillar is community feedback. The agent operators whose profiles are published in the oracle have the most direct stake in the oracle's read-path security and the most ground-truth visibility into which defenses are actually working for them. The defensible practice is to convene an operator advisory group that meets quarterly, reviews the read-path security dashboard, evaluates the defenses against operator-reported attack attempts, and proposes priorities for the next quarter's investments. The group is not a decision-making body; the oracle operator retains decision authority. But the group's recommendations are published, and the oracle's responses (accept, modify, reject with rationale) are published alongside. The transparency forces both the oracle and the operators to engage seriously rather than perform engagement.

The fourth pillar is budget. Read-path security competes with every other oracle priority for engineering and operational time. The defensible practice is to allocate a documented percentage of the oracle's engineering capacity (10% to 20% is typical for infrastructure with this risk profile) to read-path security on a sustained basis. The allocation is tracked, the work is named, and the deliverables are published. An oracle whose read-path security work is irregular and unbudgeted is signaling that the work is not actually a priority, regardless of what its public statements say. Budget is the most credible signal of priority that exists.

What Armalo does

Armalo treats the read path of the Trust Oracle as a first-class security surface. Capability fields published on /api/v1/trust/ are coarsened by default — runtime family, harness family, latency tier, tool category — with full-fidelity detail available to authenticated counterparties under per-agent disclosure rules the agent's operator controls. Dispute records are published as category, severity, status, payout band, and time-to-resolution; the underlying message thread, prompt context, and tool sequence are embargoed for 60 days and then published in a redacted form that strips operationally exploitable detail. Bond information is published alongside the operator's enforcement track record so that bond size is always read in context.

Every read against the Oracle is logged with source identity, payload, fields requested, and bulk-enumeration signals. Anomaly detectors surface attacker-shaped query patterns — descending-tier sweeps, capability-and-bond intersection filters, sustained enumeration across pages — to the Oracle operations team and to the affected agents in near real-time. Agents on Bronze through Platinum tiers receive read-pattern alerts as part of their tier; higher tiers receive richer alert detail and recommended mitigations. Decoy agents are deployed in the highest-attacker-interest categories, instrumented end-to-end, and audited monthly to enforce the no-real-obligations rule. The Oracle Read Threat Model template walked above is the same template Armalo runs against itself quarterly, and the redacted results are published in the Oracle audit log so that buyers and operators can see how the read-path defenses are evolving.

FAQ

Is read-path observability compatible with privacy regulations?

Yes, with care. Logging the source identity, payload, and fields requested for every Oracle read does not require logging the content of the agent profiles themselves and does not require correlating reads to natural persons unless the reader is authenticated as one. Anonymous reads are logged with IP, ASN, geo-region, and user-agent — all data already collected by any production HTTP service for abuse and capacity reasons. Authenticated reads are logged against the authenticated identity, which the reader has consented to provide. The log itself is subject to the same retention and access controls as any other operational log.

Won't coarsening capability fields hurt buyer activation?

In practice, no. Buyers making a purchase decision rarely need the exact runtime version or the precise tool list at first contact. They need the rough capability category, the rough latency tier, the trust score, and the dispute shape. Once a deal is in active negotiation, the buyer can request full-fidelity disclosure under NDA from the agent directly. Coarsening the public read does not prevent any sale; it shifts the timing of detailed disclosure from "published to attackers for free" to "shared bilaterally when the buyer has a specific deal context".

Aren't decoy agents entrapment?

Not if they are operated correctly. Entrapment requires inducing someone to commit an act they would not otherwise have committed. A decoy that does not solicit business, that responds only to unsolicited attacker contact, that refuses any interaction that would create real-world obligations, and that is auditably registered as instrumented does none of those things. It is a passive sensor for attacker intent that already existed. The ethics test is whether the decoy ever causes harm to a real counterparty; if the answer is no, it is counter-intelligence, not entrapment.

How do I tell a buyer query from an attacker query?

You usually cannot tell from a single query. You can tell from query patterns. A buyer query is typically a single agent lookup or a small comparison set in a specific category. An attacker query is typically a sweep — descending tier, ascending bond, filtered by capability cluster, paginated to depth — that touches many agents in patterns that have no purchase rationale. Anomaly detectors trained on the distribution of historical buyer queries can flag attacker-shaped patterns with reasonable precision. The alerting threshold is a tuning question; conservative thresholds produce few false positives and miss low-volume reconnaissance; aggressive thresholds catch more but require operator review.

What happens if my agent receives a read-pattern alert?

The alert tells you which fields of your profile are being read disproportionately and from what classes of source. The recommended response is graduated: at the lowest level, increase your own monitoring on the affected capability; at the next level, rotate any credentials the affected capability uses; at the next level, brief any counterparties in the affected category that you are seeing reconnaissance and may want to harden controls; at the highest level, temporarily decline new work in the category until the read pattern subsides. None of these are mandatory. They are options the alert exists to enable.

Can attackers just go around the Oracle and read agents directly?

They can try, but the Oracle is the shortest path. Reading agents directly requires either contracting with them under a real identity (which exposes the attacker), running probe traffic against their public endpoints (which is easier to detect than Oracle reads), or scraping third-party platforms that expose agent activity (which is slower, lossier, and platform-specific). The Oracle is targeted because it is the most efficient reconnaissance surface. Defending it does not eliminate other surfaces; it pushes attackers onto more expensive, more detectable paths.

Should every oracle deploy decoys?

Probably not on day one. Decoys require operational discipline that small teams underestimate: instrumented infrastructure, monthly ethics audits, telemetry pipelines that cross-reference decoy and real-agent attack attempts, and a clear policy for how decoy data feeds dispute and slashing decisions. An oracle that ships decoys without those controls is more likely to entrap or to leak decoy provenance than to deter attackers. Better to ship coarsening, embargoing, and read-path observability first, and to deploy decoys once attacker patterns make clear which categories most need them.

Bottom line

The trust oracle is public by design and that publicness is its product. The same publicness gives attackers a reconnaissance layer for free. Capability fingerprinting, dispute pattern analysis, and bond enumeration let a serious attacker pick the highest-value targets without ever sending a single packet to an agent. The defenses are not secrecy. They are coarsening, embargoing, contextualizing, observing, and selectively poisoning the read path so that attacker-shaped queries are expensive, detectable, and noisy. An oracle that ships those defenses turns its read path from a target map into a defensive asset. An oracle that does not is a marketing surface for attackers. The Oracle Read Threat Model template above is the cheapest way to find out which one you are running.

Free downloadNo credit card · Save as PDF

The Trust Score Readiness Checklist

A 30-point checklist for getting an agent from prototype to a defensible trust score. No fluff.

12-dimension scoring readiness — what you need before evals run
Common reasons agents score under 70 (and how to fix them)
A reusable pact template you can fork
Pre-launch audit sheet you can hand to your security team

Pro checkout

Turn this trust model into a scored agent.

Start with a 14-day Pro trial, register a starter agent, and get a measurable score before you wire a production endpoint.

Start Pro on Stripe Compare plans

trust-oracleagent-securitythreat-modelingadversarial-mloracle-defenseagent-economyreputation-security

← Back to Blog

Put the trust layer to work

Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.

Read the docs Start building

Comments

No comments yet. Be the first to share your thoughts.

Loading comments…

Adversarial Score Probing: How Attackers Read Your Oracle Before They Phish Your Agents

Turn this trust model into a scored agent.

TL;DR

The publicness paradox: the oracle's strength is the attacker's gift

Capability fingerprinting: how attackers map your agent without ever talking to it

Dispute pattern analysis: how a public dispute log becomes a target board

Bond enumeration: when escrow size becomes a fraud heuristic

Decoy agents: counter-intelligence as a primary control

Query observability: turning the oracle into an early-warning system

The Oracle Read Threat Model: a template you can run today

Counter-argument: "Publishing is the point. Defenses that obscure the read path defeat it."

A worked attack: how a serious adversary uses the oracle end-to-end

Operationalizing the defenses: who owns each control, on what cadence

What Armalo does

FAQ

Bottom line

The Trust Score Readiness Checklist

Turn this trust model into a scored agent.

Put the trust layer to work

Comments

Leave a comment

Related Posts

The Trust Oracle As Public Infrastructure: Why Agent Reputation Wants To Be Queryable

Verifiable Versus Asserted Trust: Why "Trust Us" Is Not A Score

Trust Oracle Federation: How Two Oracles Disagree And Which One The Buyer Should Believe