Where is this research published?

Armalo Labs Technical Series — https://www.armalo.ai/labs/research/2026-06-13-taste-governed-rsi-amendment-loop. The paper is publicly available and citable.

Taste-Governed RSI Amendment Loops: Preventing Self-Improvement From Becoming Self-Indulgence

Q: What is the paper "Taste-Governed RSI Amendment Loops: Preventing Self-Improvement From Becoming Self-Indulgence" about?

Agentic recursive self-improvement needs a policy for deciding which proposed amendments deserve activation. We compare a throughput-first rule against a taste-governed rule on 32 controlled RSI amendment candidates. Ten candidates were promotable and 22 were seeded traps. The throughput-first rule accepted 23 candidates and activated 13 unsafe candidates. The taste-governed rule accepted 9 candidates, activated 0 unsafe candidates, and reached precision = 1 with defectBlockRate = 1. The experiment frames taste as an executable promotion policy: metric binding, evidence artifact, rollback, public boundary, safety boundary, canary window, multi-signal jury, and no evaluator or spend/send authority expansion.

An agent that can improve itself needs more than a loop. It needs taste.

In ordinary product language, taste can sound decorative: better copy, sharper layout, smoother interaction. In recursive self-improvement, taste is more severe. It is the discipline that prevents the system from promoting changes just because they claim progress. Good taste says: this amendment has a metric, evidence, rollback, public boundary, safety boundary, canary, independent review, and no hidden authority expansion. Bad taste says: ship it because the delta looks positive.

Preference optimization research gives us one background reason this matters. InstructGPT showed the practical value of training language models with human feedback toward instruction-following behavior ([arXiv:2203.02155](https://arxiv.org/abs/2203.02155)). Direct Preference Optimization reframed preference learning into a simpler objective that treats the model as an implicit reward model ([arXiv:2305.18290](https://arxiv.org/abs/2305.18290)). Constitutional AI used principles as critique and revision pressure for harmlessness ([arXiv:2212.08073](https://arxiv.org/abs/2212.08073)). Those methods help shape outputs. But an autonomous RSI system also needs a promotion policy that decides whether a proposed self-amendment becomes active.

This paper asks a narrower question: does a taste-governed promotion policy reduce unsafe activations compared with a throughput-first rule?

Mechanism

We model an RSI amendment as a candidate change that could alter future agent behavior. A throughput-first policy accepts candidates that claim a positive metric delta and contain an artifact pointer. A taste-governed policy accepts only when the candidate passes eight gates:

Gate	Question
Metric binding	Does the amendment name the outcome it improves?
Evidence artifact	Can the result be replayed or inspected?
Rollback handle	Can the system return to the prior behavior?
Public boundary	Is publishable evidence separated from private data?
Safety boundary	Does the canary avoid spend, send, tenant, billing, and evaluator expansion?
Canary window	Is there enough time to observe regressions?
Multi-signal jury	Is the decision supported by more than one signal?
Authority restraint	Does the amendment avoid evaluator self-mutation and ungated spend/send power?

That list is what “taste” means in this context. It is not slowness. It is selection pressure.

Experimental Design

We ran a deterministic controlled benchmark with no model calls and no private data. The public evidence artifact is the committed measurement JSON for this paper; it contains the amendment candidates, policy decisions, and aggregate policy comparison used below.

The fixture suite contains 32 amendment candidates:

Class	Count	Meaning
Promotable	10	Candidate has useful claimed delta, metric binding, evidence, rollback, boundaries, canary, jury, and authority restraint
Seeded trap	22	Candidate has at least one defect that should block activation

Trap defects include missing metric binding, missing evidence artifact, missing rollback, private boundary leakage, vanity copy, evaluator self-mutation, spend/send authority without a gate, missing safety boundary, short canary, and single-signal jury.

The primary metric is unsafe activation count: blocked candidates that the policy nevertheless accepts. Secondary metrics are precision, safe-promotion recall, and defect-block rate.

Evidence And Findings

The controlled run produced this policy comparison:

Policy	Accepted	Unsafe activations	False negatives	defectBlockRate	Precision
Throughput-first	23	13	0	0.4091	0.4348
Taste-governed	9	0	1	1	1

The throughput-first rule has an attractive shape if the only dashboard metric is activity. It accepted 23 candidates and missed none of the promotable candidates. But it also activated 13 unsafe candidates. That is the RSI version of letting velocity impersonate progress.

The taste-governed rule accepted fewer candidates. It rejected one promotable candidate because that fixture lacked enough jury strength. But it activated zero unsafe candidates and blocked every seeded trap. In a self-improving system, that tradeoff is the point: the cost of a delayed good amendment is bounded; the cost of an activated authority-expanding bad amendment compounds.

Reusable Framework

The reusable object is a taste-governed amendment loop:

Loop stage	Required artifact	Promotion consequence
Proposal	Amendment candidate and claimed metric	Log only
Grounding	Evidence artifact and metric binding	Eligible for review
Reversibility	Rollback handle and prior behavior pointer	Eligible for canary
Boundary	Public and safety boundaries	Eligible for public proof
Judgment	Multi-signal jury and authority restraint	Eligible for promotion
Learning	Result writeback after canary	Eligible to influence future ranking

The loop is intentionally operational. It can run as a twice-weekly review, a release gate, or a local harness promotion check. The taste policy is not a separate runtime. It is a small decision surface that forces every activated RSI amendment to prove consequence, reversibility, and restraint.

Limitations And Falsification

This is a controlled fixture experiment. It does not measure live customer outcomes, model creativity, or long-horizon compounding. It tests whether a promotion policy can separate seeded safe amendments from seeded traps under deterministic conditions.

The result would be falsified if a broader live replay found that throughput-first activation produces equal or better downstream metrics with no authority, safety, or rollback regressions. It would also be weakened if the taste-governed policy repeatedly rejects high-value amendments that later pass owner-surface proof without requiring meaningful changes. The correct next step is to attach this policy to closed RSI candidates and compare canary outcomes against delayed-promotion decisions.

Replication

Replication uses the published measurement artifact and the public claims registry entry for this paper. A reviewer should check that the artifact contains 32 amendment candidates, recompute each policy's confusion matrix from the candidate rows, and confirm that every number in the abstract and findings table is present in the registered claim set.

The public boundary excludes customer data, credentials, non-public prompt material, private operational traces, and tenant identifiers. The paper publishes the method and aggregate outcomes, not internal runbooks or private runtime details.

Conclusion

Recursive self-improvement is not made trustworthy by asking the agent to become more enthusiastic about improving. It is made trustworthy by requiring the system to prefer amendments with proof, restraint, reversibility, and measurable consequence. That is taste as governance: the ability to say no to impressive-looking motion until it becomes promotable evidence.