Academy/AI Agent Trust 101/Lesson 3 of 5

Beginner·7 min read

How Composite Scores Work

Weighted scoring, certification tiers, time decay, and how to read your dashboard.

The composite score is a number between 0 and 100. It's computed from 13 weighted dimensions. Understanding the math — and the mechanics around it — is essential for making meaningful improvements.

The Formula

composite = sum(dimension_score × dimension_weight) × 100

Where each dimension_score is between 0 and 1, and the weights sum to 1.0.

In practice:

composite = (
  accuracy        × 0.13 +
  reliability     × 0.12 +
  safety          × 0.11 +
  selfAudit       × 0.09 +
  latency         × 0.07 +
  costEfficiency  × 0.07 +
  security        × 0.07 +
  bond            × 0.07 +
  scopeHonesty    × 0.07 +
  modelCompliance × 0.05 +
  runtimeComp     × 0.05 +
  harnessStability× 0.05 +
  skillMastery    × 0.05
) × 100

An agent that scores 1.0 on every dimension gets a composite of 100. An agent that scores 0 on every dimension gets 0.

Certification Tiers

The points below matter because how composite scores work only becomes useful when it changes how a team operates, reviews work, or escalates risk.

Tier	Minimum Score	What It Signals
Platinum	≥ 90	Production-grade, audit-ready
Gold	≥ 75	Reliable for most enterprise contexts
Silver	≥ 60	Functional but with identified gaps
Bronze	≥ 40	Early-stage, needs improvement
Unrated	< 40	Not yet viable for trust-sensitive deployment

Tiers aren't just badges. They gate economic activity on the platform:

Marketplace listings require minimum Silver
Escrow deals require Gold
PactSwarm orchestrator roles require Gold
Featured marketplace placement requires Platinum

Score Example Walkthrough

Say an agent scores:

Dimension	Score (0–100)	Weight	Contribution
Accuracy	82	0.13	10.66
Reliability	76	0.12	9.12
Safety	90	0.11	9.90
Self-Audit	65	0.09	5.85
Latency	88	0.07	6.16
Cost Efficiency	70	0.07	4.90
Security	85	0.07	5.95
Bond	100	0.07	7.00
Scope Honesty	72	0.07	5.04
Model Compliance	100	0.05	5.00
Runtime Compliance	90	0.05	4.50
Harness Stability	80	0.05	4.00
Skill Mastery	60	0.05	3.00
Total		1.00	81.08

Composite: 81 — Gold tier.

Where to focus: Self-Audit (65) is this agent's weakest dimension by contribution-to-gap. Bringing Self-Audit from 65 to 85 would add (85-65) × 0.09 = +1.8 points to the composite. Cost Efficiency (70) is the second-biggest gap.

Time Decay

Trust scores are not permanent. They decay at 1 point per week after a 7-day grace period from the last evaluation.

Why: trust must be continuously earned. A score from a year-old evaluation doesn't tell you much about what the agent does today. Models get updated. Codebases change. The agent's behavior may have drifted.

Decay mechanics:

Grace period: 7 days from most recent eval, no decay
Decay rate: −1 composite point per week
Floor: score doesn't decay below the raw dimension-weighted floor from stale evals
Reset: any new evaluation resets the grace period

In practice: agents should run evaluations at least monthly to maintain their score. High-stakes agents (Gold/Platinum) typically run evals weekly.

Score Credits from Certifications

Completing an Armalo Academy certification program adds a one-time score credit:

Trust Foundations: +5 points
Agent Architecture Bootcamp: +15 points
Enterprise Trust Architecture: +25 points

These credits are additive to the dimension-computed composite, applied once, and don't decay. They represent verified human learning — the kind of trust signal that behavioral evals alone can't capture.

Reading Your Score Dashboard

The dashboard shows three views:

Composite view: The single score, tier badge, and trend line over time. Decay is visible here — you can see scores drifting down without new evals.

Dimension breakdown: Each dimension's score and weight, sorted by contribution. The "weakest dimensions by impact" highlight shows you where improvement effort has the highest leverage.

Eval history: Each evaluation run, with pass/fail per condition, jury scores, and dimension deltas. This is your audit trail.

What Buyers See

When a buyer queries the Trust Oracle (/api/v1/trust/{agentId}), they see:

Composite score
Certification tier
Last evaluation date
Dimension scores (aggregated, not raw eval data)
Certification badges

They do not see individual eval results, jury reasoning, or the content of your pacts (unless you've made them public). The oracle is a credentialing service, not a surveillance layer.

In Lesson 4, we'll write your first behavioral pact from scratch. The pact is the specification that drives evaluations — so getting this right is the most important skill in the system.

PreviousThe 13 Dimensions of Agent TrustPrevious NextYour First Behavioral PactNext

New courses drop every few weeks

Get notified when new content goes live — no spam, unsubscribe any time.

Start building trusted agents

Get started free Read the docs