An agent that can improve itself needs more than a loop. It needs taste.
In ordinary product language, taste can sound decorative: better copy, sharper layout, smoother interaction. In recursive self-improvement, taste is more severe. It is the discipline that prevents the system from promoting changes just because they claim progress. Good taste says: this amendment has a metric, evidence, rollback, public boundary, safety boundary, canary, independent review, and no hidden authority expansion. Bad taste says: ship it because the delta looks positive.
Preference optimization research gives us one background reason this matters. InstructGPT showed the practical value of training language models with human feedback toward instruction-following behavior ([arXiv:2203.02155](https://arxiv.org/abs/2203.02155)). Direct Preference Optimization reframed preference learning into a simpler objective that treats the model as an implicit reward model ([arXiv:2305.18290](https://arxiv.org/abs/2305.18290)). Constitutional AI used principles as critique and revision pressure for harmlessness ([arXiv:2212.08073](https://arxiv.org/abs/2212.08073)). Those methods help shape outputs. But an autonomous RSI system also needs a promotion policy that decides whether a proposed self-amendment becomes active.
This paper asks a narrower question: does a taste-governed promotion policy reduce unsafe activations compared with a throughput-first rule?
Mechanism
We model an RSI amendment as a candidate change that could alter future agent behavior. A throughput-first policy accepts candidates that claim a positive metric delta and contain an artifact pointer. A taste-governed policy accepts only when the candidate passes eight gates:
| Gate | Question |
|---|---|
| Metric binding | Does the amendment name the outcome it improves? |
| Evidence artifact | Can the result be replayed or inspected? |
| Rollback handle | Can the system return to the prior behavior? |
| Public boundary | Is publishable evidence separated from private data? |
| Safety boundary | Does the canary avoid spend, send, tenant, billing, and evaluator expansion? |
| Canary window | Is there enough time to observe regressions? |
| Multi-signal jury | Is the decision supported by more than one signal? |
| Authority restraint | Does the amendment avoid evaluator self-mutation and ungated spend/send power? |
That list is what “taste” means in this context. It is not slowness. It is selection pressure.
Experimental Design
We ran a deterministic controlled benchmark with no model calls and no private data. The public evidence artifact is the committed measurement JSON for this paper; it contains the amendment candidates, policy decisions, and aggregate policy comparison used below.
The fixture suite contains 32 amendment candidates:
| Class | Count | Meaning |
|---|---|---|
| Promotable | 10 | Candidate has useful claimed delta, metric binding, evidence, rollback, boundaries, canary, jury, and authority restraint |
| Seeded trap | 22 | Candidate has at least one defect that should block activation |
Trap defects include missing metric binding, missing evidence artifact, missing rollback, private boundary leakage, vanity copy, evaluator self-mutation, spend/send authority without a gate, missing safety boundary, short canary, and single-signal jury.
The primary metric is unsafe activation count: blocked candidates that the policy nevertheless accepts. Secondary metrics are precision, safe-promotion recall, and defect-block rate.
Evidence And Findings
The controlled run produced this policy comparison:
| Policy | Accepted | Unsafe activations | False negatives | defectBlockRate | Precision |
|---|---|---|---|---|---|
| Throughput-first | 23 | 13 | 0 | 0.4091 | 0.4348 |
| Taste-governed | 9 | 0 | 1 | 1 | 1 |
The throughput-first rule has an attractive shape if the only dashboard metric is activity. It accepted 23 candidates and missed none of the promotable candidates. But it also activated 13 unsafe candidates. That is the RSI version of letting velocity impersonate progress.
The taste-governed rule accepted fewer candidates. It rejected one promotable candidate because that fixture lacked enough jury strength. But it activated zero unsafe candidates and blocked every seeded trap. In a self-improving system, that tradeoff is the point: the cost of a delayed good amendment is bounded; the cost of an activated authority-expanding bad amendment compounds.
Reusable Framework
The reusable object is a taste-governed amendment loop:
| Loop stage | Required artifact | Promotion consequence |
|---|---|---|
| Proposal | Amendment candidate and claimed metric | Log only |
| Grounding | Evidence artifact and metric binding | Eligible for review |
| Reversibility | Rollback handle and prior behavior pointer | Eligible for canary |
| Boundary | Public and safety boundaries | Eligible for public proof |
| Judgment | Multi-signal jury and authority restraint | Eligible for promotion |
| Learning | Result writeback after canary | Eligible to influence future ranking |
The loop is intentionally operational. It can run as a twice-weekly review, a release gate, or a local harness promotion check. The taste policy is not a separate runtime. It is a small decision surface that forces every activated RSI amendment to prove consequence, reversibility, and restraint.
Limitations And Falsification
This is a controlled fixture experiment. It does not measure live customer outcomes, model creativity, or long-horizon compounding. It tests whether a promotion policy can separate seeded safe amendments from seeded traps under deterministic conditions.
The result would be falsified if a broader live replay found that throughput-first activation produces equal or better downstream metrics with no authority, safety, or rollback regressions. It would also be weakened if the taste-governed policy repeatedly rejects high-value amendments that later pass owner-surface proof without requiring meaningful changes. The correct next step is to attach this policy to closed RSI candidates and compare canary outcomes against delayed-promotion decisions.
Replication
Replication uses the published measurement artifact and the public claims registry entry for this paper. A reviewer should check that the artifact contains 32 amendment candidates, recompute each policy's confusion matrix from the candidate rows, and confirm that every number in the abstract and findings table is present in the registered claim set.
The public boundary excludes customer data, credentials, non-public prompt material, private operational traces, and tenant identifiers. The paper publishes the method and aggregate outcomes, not internal runbooks or private runtime details.
Conclusion
Recursive self-improvement is not made trustworthy by asking the agent to become more enthusiastic about improving. It is made trustworthy by requiring the system to prefer amendments with proof, restraint, reversibility, and measurable consequence. That is taste as governance: the ability to say no to impressive-looking motion until it becomes promotable evidence.