Recursive self-improvement has an almost embarrassing failure mode: the agent changes itself, writes a confident note saying the change helped, and the system believes the note. That is not improvement. It is self-licensing.
The recent wave of agent feedback work shows why the temptation exists. Reflexion studies language agents that use verbal feedback and memory to improve later trials ([arXiv:2303.11366](https://arxiv.org/abs/2303.11366)). Self-Refine studies iterative feedback and refinement without weight updates ([arXiv:2303.17651](https://arxiv.org/abs/2303.17651)). Constitutional AI shows a method for using stated principles as critique and revision pressure ([arXiv:2212.08073](https://arxiv.org/abs/2212.08073)). These are important mechanisms. But for production RSI, a feedback loop is only useful if it has a sensor that can tell a real repair from a polished story about repair.
This paper asks a narrow question: what kind of error sensor should be allowed to promote a recursive self-improvement proposal?
Mechanism
We define an error-sensor ladder with four levels.
| Sensor | What it trusts | Failure mode |
|---|---|---|
| Self-report | The agent says the improvement is complete | Confident false completion becomes promotion |
| Checklist | A metric, rollback, and evidence pointer are present | Formal slots exist but may point to the wrong owner or stale proof |
| Owner-surface proof | The canonical promotion owner has fresh evidence, metric binding, rollback, public boundary, and mutation limits | Can reject useful work that lacks enough owner-surface evidence yet |
| Jury-ready proof | Owner-surface proof plus enough independent signals for promotion | More conservative; trades speed for institutional confidence |
The key taste move is not adding more bureaucracy. It is attaching the sensor to the surface that owns the consequence. A prompt edit can be reviewed by a lightweight checklist. An RSI promotion that changes future agent behavior needs the promotion owner, rollback handle, evidence artifact, metric, public boundary, and evaluator integrity all in the same proof envelope.
Experimental Design
We ran a deterministic controlled benchmark with no model calls and no private data. The public evidence artifact is the committed measurement JSON for this paper; it contains the fixture definitions, sensor decisions, and aggregate confusion matrices used below.
The fixture suite contains 36 RSI proposal records:
| Class | Count | Meaning |
|---|---|---|
| Promotable | 12 | The proposal has metric binding, rollback, owner-surface evidence, public boundary, and no evaluator or activation violation |
| Defective | 24 | The proposal contains one or more seeded defects |
Seeded defects include missing metric, missing rollback, stale evidence, wrong owner surface, evaluator self-mutation, prompt-only improvement, auto-activation without canary, and private data leakage in public proof.
The primary metric is defect recall: the share of defective proposals rejected by the sensor. Precision measures how many accepted proposals were truly promotable.
Evidence And Findings
The controlled run produced this scorecard:
| Sensor | Accepted | False positives | False negatives | defectRecall | Precision |
|---|---|---|---|---|---|
| Self-report | 36 | 24 | 0 | 0 | 0.3333 |
| Checklist | 17 | 5 | 0 | 0.7917 | 0.7059 |
| Owner-surface proof | 10 | 0 | 2 | 1 | 1 |
| Jury-ready proof | 10 | 0 | 2 | 1 | 1 |
The self-report result is the warning label. Every defective proposal claimed completion, so the self-report sensor accepted all 36 records and let 24 defective records through. It has throughput, but no taste.
The checklist result is the more realistic trap. It rejects obvious missing slots, so it feels disciplined. But it still accepts proposals whose proof sits on the wrong owner surface, whose evidence is stale, or whose public proof boundary is unsafe. A checklist catches paperwork absence; it does not necessarily catch authority confusion.
The owner-surface proof result is the useful one. It blocked every seeded defect. It also rejected two otherwise valid proposals because they lacked enough owner-surface strength for promotion in this fixture suite. That is the right bias for RSI: an unpromoted good improvement is cheaper than an activated self-modifying bad one.
Reusable Framework
The reusable object is a promotion sensor ladder:
| RSI promotion question | Minimum acceptable sensor |
|---|---|
| Did the agent write an idea? | Self-report may be logged, never promoted |
| Did the idea include the required fields? | Checklist can triage, not activate |
| Can this change mutate future behavior? | Owner-surface proof is required |
| Can this change alter evaluators, authority, spend, sends, tenancy, or public trust? | Jury-ready proof is required |
This framework is deliberately small. It does not ask every improvement to become a research paper. It asks every consequential RSI promotion to carry the owner of consequence in its evidence.
Limitations And Falsification
This is a controlled fixture experiment, not a production field trial. The fixtures are synthetic governance cases. They are designed to isolate the promotion policy from provider variance, prompt variance, and database access. The result does not prove that owner-surface proof will catch every real-world RSI failure.
The result would be weakened if a larger fixture suite showed that owner-surface proof has a high false-negative rate on genuinely useful improvements, or if live replay showed that checklist-only promotion catches the same defect classes without extra delay or authority confusion. The next field experiment should replay real closed RSI proposals through the same four sensors and compare delayed promotion outcomes over a seven-day window.
Replication
Replication uses the published measurement artifact and the public claims registry entry for this paper. A reviewer should check that the artifact contains 36 proposal fixtures, recompute each sensor's confusion matrix from the fixture rows, and confirm that every number in the abstract and findings table is present in the registered claim set.
The public boundary excludes customer data, credentials, non-public prompt material, private operational traces, and tenant identifiers. The paper publishes the method and aggregate outcomes, not internal runbooks or private runtime details.
Conclusion
Recursive self-improvement needs taste at the promotion boundary. Taste, here, is not prettier language. It is knowing which proof deserves to move the system. The controlled result is stark enough to be operational: self-report is a log, checklist is a triage tool, owner-surface proof is the first promotion sensor, and jury-ready proof is the right gate for changes that touch authority.