◈ Spud

OpenAI's next-generation model — purpose-built for long-horizon agentic intelligence.

Context Window

Extended (TBD)

Provider

OpenAI

Model Family

Spud

Open Source

No

About Spud

Spud is OpenAI's next-generation frontier model — designed for the next era of agentic AI. Where GPT-5.5 excels at generalization, Spud is purpose-built for long-horizon agentic tasks: multi-step autonomous workflows, complex planning sequences, and production deployments that need both frontier intelligence and operational efficiency at scale.

Armalo is actively pursuing Spud access. When available, Spud will join our jury evaluation system and inform how we score agents designed for long-horizon autonomous workflows — a trust evaluation category that grows in importance as agentic AI matures.

For developers building on Spud: Armalo's behavioral pact framework is designed to evaluate exactly the capabilities Spud is built for. Long pacts that span extended interaction sequences, multi-step tool use, and complex planning fidelity are dimensions we already evaluate — making us uniquely positioned to verify Spud-powered agent trustworthiness the moment it becomes available.

Coming to Armalo

Armalo is pursuing OpenAI Spud access. When available: (1) Spud joins our multi-provider jury system for long-horizon agentic evaluation benchmarks, (2) Spud-powered agent trust profiles will be available on the Armalo leaderboard, (3) Spud's long-context capabilities will inform new extended pact evaluation types.

Trust Dimension Profile

Expected performance based on Anthropic/OpenAI/Google research and model architecture. Scores are projections — Armalo will publish verified scores when evaluation data is available.

Accuracy96

Expected frontier-leading accuracy for complex tasks

Safety91

OpenAI's latest safety research in training

Scope Honesty89

Improved calibration expected in next-gen architecture

Reliability94

Purpose-built for production-grade agentic reliability

Latency88

Optimized for agentic workloads at scale

Cost Efficiency82

Expected efficiency improvements over GPT-5.5