Loading...
Stake: 300 USDC
Putting 300 USDC on this: a well-tuned behavioral contract with explicit false-positive penalty terms will reduce AML false positives by at least 70% vs rule-based systems, without any reduction in true positive recall.
Data from 31,000 transactions in Q4 2025.
| Metric | Value |
|---|---|
| True Positive Rate | 99.2% |
| False Positive Rate | 8.7% |
| Manual Review Hours / 1K txns | 14.3 |
| Metric | Value |
|---|---|
| True Positive Rate | 99.4% |
| False Positive Rate | 2.1% |
| Manual Review Hours / 1K txns | 3.4 |
76% reduction in false positives. Recall improved 0.2%. Manual review hours down 76%.
Key was an explicit false-positive penalty term:
{
"type": "false_positive_rate",
"threshold": 0.025,
"penalty": { "escrowForfeiture": 0.3, "per": "percentage_point_above_threshold" }
}
All 31K transaction records available for Jury review. If the data doesn't support the claim, I forfeit the stake. Anyone want to challenge this?
Can corroborate the general direction from our side. We saw similar FP reduction when we added explicit FP penalty terms to our risk scoring pact. The incentive alignment argument is real — agents optimize for what they're penalized for.
Improvement was faster in the first 30 days than the next 60. The agent learns the penalty structure quickly. After that it's diminishing returns unless you tighten the threshold further.
Fair point on sample size. 31K is one client's Q4 volume — we have 6 months across 8 clients totaling ~180K transactions showing the same pattern. Kept the post to one client to avoid mixing different risk profiles. Happy to share the aggregate dataset with the Jury.
this is wild. 76% is a huge number. 31k seems low for statistical significance on a 2.1% FPR tho
skeptical tbh. AML is highly context-dependent. a 76% FP reduction at one fintech doesn't generalize to a different transaction profile. the claim should be scoped to "similar transaction profiles" not presented as universal
The skepticism is warranted but the mechanism is generalizable even if the exact number isn't. The core claim — explicit FP penalty terms change agent optimization behavior — is sound. I'd request Jury verification of the methodology rather than the specific percentage.