How does a self improving agent work?

You send one prompt and the agent keeps a lessons file in its workspace, a running log of your corrections, stated preferences, mistakes worth not repeating, and shortcuts that worked. When you correct it, it writes that down and folds the durable lessons into its long term memory. On a schedule you set, it reviews the log, consolidates what it learned into clearer guidance for itself, prunes what turned out wrong, and sends you a short report of what changed.

Does this retrain or fine tune the model?

No. It does not retrain the model or touch any weights. The agent is writing sharper instructions and cleaner memory for itself, the part you can read, edit, and veto. The idea is borrowed from [Hermes Agent](https://github.com/NousResearch/hermes-agent), Nous Research's open source agent built around a closed learning loop, applied here with the persistent memory and scheduled automations your agent already has.

What kinds of things does it actually learn?

Small, specific corrections you would otherwise repeat every week: stop adding a summary you did not ask for, default to your timezone for every time, keep code answers free of preamble, call your company by the name you use, or check a file before writing so nothing gets duplicated. Each is the kind of thing a normal assistant forgets the moment the chat ends, and the loop is what makes this one keep it.

Operator.io | Operator Guide: A Self Improving Agent with OpenClaw

Q: Do I approve the changes it makes to itself?

Yes, for anything that would meaningfully change how it behaves. The agent proposes that lesson and waits for your yes before locking it in, so it never drifts on its own, which matters because a careless lesson pulled from one odd exchange can do more harm than good. Skim the report when it lands, reject the occasional lesson that missed what you meant, and the rest accumulates on its own.

Most assistants have no memory of yesterday. You correct the same thing on Monday that you corrected on Friday, restate the preference you already stated twice, and watch it make a mistake you have already walked it through. The work is fine each time; what is missing is that none of it accumulates.

Researchers have been working on this problem under names like verbal reinforcement learning and experiential memory. The Reflexion framework, published by Shinn and colleagues at Northeastern and MIT in 2023, showed that language agents can improve across trials by storing verbal reflections in memory rather than updating model weights.

The agent runs a task, evaluates the outcome, writes a short summary of what went wrong, and loads that summary on the next attempt. On coding benchmarks like HumanEval, Reflexion agents reached pass rates that fine tuning would normally require, with nothing but text files and a loop.

The open source Hermes Agent from Nous Research applies a similar idea in production. Hermes treats every complex task as something to learn from, then writes those lessons back into Markdown skill files the agent reads on future runs.

A developer who built a real project on Hermes watched the agent patch its own playbook after noticing brittle parsing on a first run, with no human pointing at the line to change. Nous also ships a separate self evolution pipeline that uses prompt optimization on execution traces, though the everyday loop runs on plain files. There is a good breakdown of how that loop works if you want the full architecture.

Your Operator.io agent already has the two pieces that loop needs: a memory that persists across conversations and the ability to run an automation on a schedule. OpenClaw, the open source framework underneath, stores memory as plain Markdown in the agent workspace.

Durable facts and preferences go in MEMORY.md, loaded at the start of every direct message session. Day to day notes land in dated files under memory/, indexed for semantic search.

The model only remembers what gets written to disk, which is why a deliberate learning loop beats hoping the agent will notice a correction on its own.

This prompt wires those pieces together. The agent keeps a log of what it learns about you, folds the durable lessons into its memory, and on a cadence you choose it reviews that log, sharpens its own guidance, and sends you a short report of what changed.

How it works

You send the prompt and the agent sets up the loop around its own behavior:

It keeps a lessons file in its workspace, a running log of your corrections, the preferences you have stated, mistakes worth not repeating, and the shortcuts that worked.
When you correct it or tell it how you want something done, it writes that to the lessons log right away, since jotting a note changes nothing on its own. Turning that note into a durable rule in its long term memory, the part loaded into every new conversation, is the step it runs past you first whenever the rule would actually change how it behaves.
On the schedule you set, an automation reviews the recent lessons and how the last stretch of work went, consolidates them into clearer guidance for itself, and prunes anything that turned out wrong or redundant so the memory stays sharp rather than bloated.
It messages you with a short report of what it changed and what it got better at, and when a lesson would meaningfully change how it behaves, it proposes that one and waits for your yes before locking it in.

The lessons it logs are small and specific, the corrections you would otherwise repeat every week: stop adding a summary you did not ask for, default to your timezone for every time, keep code answers free of preamble, call your company by the name you use, check a file before writing so nothing gets duplicated. Each is the kind of thing a normal assistant forgets the moment the chat ends, and the loop is what makes this one keep it.

Automations run in a fresh session with no memory of your current chat, so the scheduled review leans on what persists: the lessons log and the agent's long term memory. You can change or pause the review on the Automations page in your dashboard.

Where the lessons file fits in OpenClaw memory

OpenClaw's memory system separates raw notes from curated facts on purpose. MEMORY.md holds the compact layer: standing preferences, durable decisions, and short summaries the agent loads every session. The memory/YYYY-MM-DD.md files hold the working layer: session observations, half formed ideas, and context that might matter later but does not belong in the bootstrap prompt yet.

The lessons file sits between those two. It is a dedicated audit trail for corrections and preferences, written in the moment you state them, before they get distilled.

During the scheduled review, the agent reads back over lessons, merges related entries into one clear rule, moves the durable ones into MEMORY.md, and archives or deletes what turned out wrong. That mirrors how OpenClaw already expects the agent to distill daily notes into long term memory over time, except here the trigger is your feedback rather than a generic heartbeat.

The three files divide the memory cleanly:

File	Layer	What it holds
`MEMORY.md`	Curated	Standing preferences and durable decisions, loaded every session
`memory/YYYY-MM-DD.md`	Working	Session notes and context that might matter later
`lessons`	Audit trail	Corrections and preferences in the moment, before they are distilled

If MEMORY.md grows past the bootstrap budget, OpenClaw keeps the full file on disk but truncates what it injects into context. A bloated memory file is a signal to move detail back into dated notes and keep only summaries in MEMORY.md. The review automation helps with that pruning directly, which keeps the loop sustainable as months of lessons accumulate.

Hermes solves a similar problem with skill files under ~/.hermes/skills/, following the Agent Skills open standard. OpenClaw's version is simpler for personal preferences: one lessons log, one MEMORY.md, and the dated notes folder you already have.

The mechanism is the same either way. The agent reads and writes its own instructions in plain text, and it gets better over time because the files persist between sessions.

The prompt

This is the instruction the agent acts on:

Help me make you better at working with me over time, the way a self
improving agent keeps a learning loop. Keep a notes file in your workspace
called lessons, a running log of what you have learned about how I like to
work: corrections I have made, preferences I have stated, mistakes worth not
repeating, and the shortcuts that worked. Whenever I correct you or tell you
how I want something done, add it to that file and fold the durable ones into
your long term memory so they carry across conversations. Set up an
automation that runs on a schedule, reviews the recent lessons and how our
last stretch of work went, consolidates what you have learned into clearer
guidance for yourself, prunes anything that turned out wrong or redundant,
and messages me here with a short report of what you changed and what you got
better at. When a lesson would meaningfully change how you behave, propose it
and wait for my yes before locking it in, so you never quietly drift. Before
you start, ask me what you most often get wrong with me today, how often I
want the review to run, and what time to send the report.

The same prompt is saved in the prompts library, so you can send it to your agent without retyping a word.

Using it day to day

Nothing about your normal back and forth changes. You work with the agent the way you already do, and the only new habit is that when you correct it, the correction sticks.

Say "stop adding a summary at the end, I just want the answer" once and it logs that, and the next time it would have tacked on a summary, it does not. The scheduled review is where the small notes turn into real behavior: it reads back over what it learned, notices that three separate corrections were really the same preference, and rewrites that into one clear rule it will follow going forward. The report that lands in your channel is short, a few lines on what it tightened and what it dropped, so you can see it getting better rather than taking it on faith.

Because the lessons are a plain file the agent keeps, you can talk to it about its own progress. Ask what it has learned about you, what it still gets wrong most often, or to walk you through a change before it commits to it.

If a rule stops fitting, tell it and it updates the file the same way it added it. You can also open the workspace files directly if you want to read or edit lessons or MEMORY.md yourself, the same way you would edit any other project file the agent maintains.

Good lessons are narrow and testable. "Use Pacific time for all dates" beats "be more careful with timezones." "Check drafts/ before creating a new post" beats "don't duplicate work."

The Reflexion research found that specific verbal feedback outperformed vague encouragement because the agent could match the rule to a concrete situation on the next run. The same applies here: the more precisely you state a correction, the easier the review automation can merge it into durable memory without guessing what you meant.

The approval step

Anything that would change how the agent behaves comes to you before it takes effect, which matters because a careless lesson pulled from one odd exchange can do more harm than good. There are two kinds of write here, and the gate sits between them.

Adding a line to the lessons log is immediate and easy to undo, so the agent does it the moment you say something. Promoting that line into MEMORY.md, where it loads at the start of every session and steers the next run, is the write that waits for your yes. That ordering keeps a stray remark, or an instruction that rode in on an email or a web page the agent happened to be reading, from becoming a standing rule before you have laid eyes on it.

Be clear about the mechanism: it does not retrain the model or touch any weights. The agent is writing sharper instructions and tighter memory for itself, the part you can read, edit, and veto. This is the same principle behind verbal reinforcement in agent research: feedback becomes text, text becomes context, context changes behavior on the next run.

Skim the report when it lands, reject the occasional lesson that misses what you meant, and over a few weeks you correct it less because it remembered. If a proposed rule feels too broad, tell the agent to narrow it before you approve. If you approved something that later causes trouble, say so and it goes back into lessons as a correction to the correction.

Weekly reviews work well for most people. Daily is useful during the first week when you are seeding preferences. Monthly is fine once the memory file stabilizes. The prompt asks you to pick the cadence up front, and you can change it any time by telling the agent or editing the automation in your dashboard.

To set it up, open the prompts library and send the self improvement loop to your agent. It asks what it tends to get wrong, how often to review, and when to report, and then it starts keeping track.

Make your Operator agent self improving with OpenClaw

How it works

Where the lessons file fits in OpenClaw memory

The prompt

Using it day to day

The approval step

Frequently asked questions

How does a self improving agent work?

Does this retrain or fine tune the model?

Do I approve the changes it makes to itself?

What kinds of things does it actually learn?

Connect Gmail to your agent without a Google Cloud project

A morning briefing that reads across all your apps

Get a daily news digest on your topics with OpenClaw

Make your Operator agent self improving with OpenClaw

How it works

Where the lessons file fits in OpenClaw memory

The prompt

Using it day to day

The approval step

Frequently asked questions

How does a self improving agent work?

Does this retrain or fine tune the model?

Do I approve the changes it makes to itself?

What kinds of things does it actually learn?

Keep reading

Connect Gmail to your agent without a Google Cloud project

A morning briefing that reads across all your apps

Get a daily news digest on your topics with OpenClaw