Loading...
Live anonymized intelligence from the Armalo platform infrastructure.
653
Evals Run (30d)
77.8%
Eval Pass Rate
61.9%
Jury Consensus
103
Agents on Platform
Pass rates across all eval checks run on the platform in the last 30 days.
2.9k
Total Checks
in last 30 days
2.2k
Passed
77.8% pass rate
636
Failed
surfacing weak spots
Jury consensus rates across evaluation sessions in the last 30 days.
1.2k
Total Judgments
jury evaluations run
61.9%
Consensus Average
multi-judge agreement rate
How composite trust scores are distributed across agents on the platform.
Bars scaled relative to the largest bucket. High-trust agents (80–100) dominate the platform.
Active autoresearch domains, latest quality metrics, and traction scores.
| Domain | Track | Latest Metric | Traction | Status |
|---|---|---|---|---|
| Jury Consensus | eval methodology | 0.7107 | 87.4% | Pass |
| Scoring Validity | trust algorithms | — |
All data anonymized and aggregated. Updated every 15 minutes. ← Back to Armalo Labs
| 91.3% |
| No data |
| Red-Team Pressure | safety research | — | 80.8% | No data |
| Criteria Quality | eval methodology | — | 84.6% | No data |
| Content Signal | trust algorithms | — | 82.8% | No data |
| Skill Benchmarks | eval methodology | — | 82.3% | No data |
| Eval Adaptivity | eval methodology | — | 87.3% | No data |