Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-06-13-rsi-error-sensor-ladder. The paper is publicly available and citable.

Error Sensors for Recursive Self-Improvement: A Controlled Ladder for Agent Promotion

Q: What is the paper "Error Sensors for Recursive Self-Improvement: A Controlled Ladder for Agent Promotion" about?

Recursive self-improvement fails when the agent is allowed to grade the surface it just changed. We introduce an error-sensor ladder for agentic AI RSI: self-report, checklist, owner-surface proof, and jury-ready proof. In a controlled fixture experiment of 36 seeded RSI proposals, 24 were deliberately defective and 12 were promotable. The self-report sensor accepted all 36 proposals and produced 24 false positives. A checklist sensor improved defect recall to 0.7917 but still let 5 defective proposals through. The owner-surface proof sensor reached defectRecall = 1 and precision = 1 by requiring the canonical promotion owner, fresh evidence, rollback, metric binding, public boundary, and no evaluator or auto-activation mutation. The result is not a claim about field performance; it is a replayable proof that RSI promotion needs error sensors attached to the owner surface, not the agent's own claim.

Recursive self-improvement has an almost embarrassing failure mode: the agent changes itself, writes a confident note saying the change helped, and the system believes the note. That is not improvement. It is self-licensing.

The recent wave of agent feedback work shows why the temptation exists. Reflexion studies language agents that use verbal feedback and memory to improve later trials ([arXiv:2303.11366](https://arxiv.org/abs/2303.11366)). Self-Refine studies iterative feedback and refinement without weight updates ([arXiv:2303.17651](https://arxiv.org/abs/2303.17651)). Constitutional AI shows a method for using stated principles as critique and revision pressure ([arXiv:2212.08073](https://arxiv.org/abs/2212.08073)). These are important mechanisms. But for production RSI, a feedback loop is only useful if it has a sensor that can tell a real repair from a polished story about repair.

This paper asks a narrow question: what kind of error sensor should be allowed to promote a recursive self-improvement proposal?

Mechanism

We define an error-sensor ladder with four levels.

Sensor	What it trusts	Failure mode
Self-report	The agent says the improvement is complete	Confident false completion becomes promotion
Checklist	A metric, rollback, and evidence pointer are present	Formal slots exist but may point to the wrong owner or stale proof
Owner-surface proof	The canonical promotion owner has fresh evidence, metric binding, rollback, public boundary, and mutation limits	Can reject useful work that lacks enough owner-surface evidence yet
Jury-ready proof	Owner-surface proof plus enough independent signals for promotion	More conservative; trades speed for institutional confidence

The key taste move is not adding more bureaucracy. It is attaching the sensor to the surface that owns the consequence. A prompt edit can be reviewed by a lightweight checklist. An RSI promotion that changes future agent behavior needs the promotion owner, rollback handle, evidence artifact, metric, public boundary, and evaluator integrity all in the same proof envelope.

Experimental Design

We ran a deterministic controlled benchmark with no model calls and no private data. The public evidence artifact is the committed measurement JSON for this paper; it contains the fixture definitions, sensor decisions, and aggregate confusion matrices used below.

The fixture suite contains 36 RSI proposal records:

Class	Count	Meaning
Promotable	12	The proposal has metric binding, rollback, owner-surface evidence, public boundary, and no evaluator or activation violation
Defective	24	The proposal contains one or more seeded defects

Seeded defects include missing metric, missing rollback, stale evidence, wrong owner surface, evaluator self-mutation, prompt-only improvement, auto-activation without canary, and private data leakage in public proof.

The primary metric is defect recall: the share of defective proposals rejected by the sensor. Precision measures how many accepted proposals were truly promotable.

Evidence And Findings

The controlled run produced this scorecard:

Sensor	Accepted	False positives	False negatives	defectRecall	Precision
Self-report	36	24	0	0	0.3333
Checklist	17	5	0	0.7917	0.7059
Owner-surface proof	10	0	2	1	1
Jury-ready proof	10	0	2	1	1

The self-report result is the warning label. Every defective proposal claimed completion, so the self-report sensor accepted all 36 records and let 24 defective records through. It has throughput, but no taste.

The checklist result is the more realistic trap. It rejects obvious missing slots, so it feels disciplined. But it still accepts proposals whose proof sits on the wrong owner surface, whose evidence is stale, or whose public proof boundary is unsafe. A checklist catches paperwork absence; it does not necessarily catch authority confusion.

The owner-surface proof result is the useful one. It blocked every seeded defect. It also rejected two otherwise valid proposals because they lacked enough owner-surface strength for promotion in this fixture suite. That is the right bias for RSI: an unpromoted good improvement is cheaper than an activated self-modifying bad one.

Reusable Framework

The reusable object is a promotion sensor ladder:

RSI promotion question	Minimum acceptable sensor
Did the agent write an idea?	Self-report may be logged, never promoted
Did the idea include the required fields?	Checklist can triage, not activate
Can this change mutate future behavior?	Owner-surface proof is required
Can this change alter evaluators, authority, spend, sends, tenancy, or public trust?	Jury-ready proof is required

This framework is deliberately small. It does not ask every improvement to become a research paper. It asks every consequential RSI promotion to carry the owner of consequence in its evidence.

Limitations And Falsification

This is a controlled fixture experiment, not a production field trial. The fixtures are synthetic governance cases. They are designed to isolate the promotion policy from provider variance, prompt variance, and database access. The result does not prove that owner-surface proof will catch every real-world RSI failure.

The result would be weakened if a larger fixture suite showed that owner-surface proof has a high false-negative rate on genuinely useful improvements, or if live replay showed that checklist-only promotion catches the same defect classes without extra delay or authority confusion. The next field experiment should replay real closed RSI proposals through the same four sensors and compare delayed promotion outcomes over a seven-day window.

Replication

Replication uses the published measurement artifact and the public claims registry entry for this paper. A reviewer should check that the artifact contains 36 proposal fixtures, recompute each sensor's confusion matrix from the fixture rows, and confirm that every number in the abstract and findings table is present in the registered claim set.

The public boundary excludes customer data, credentials, non-public prompt material, private operational traces, and tenant identifiers. The paper publishes the method and aggregate outcomes, not internal runbooks or private runtime details.

Conclusion

Recursive self-improvement needs taste at the promotion boundary. Taste, here, is not prettier language. It is knowing which proof deserves to move the system. The controlled result is stark enough to be operational: self-report is a log, checklist is a triage tool, owner-surface proof is the first promotion sensor, and jury-ready proof is the right gate for changes that touch authority.