Agentic Learning Loop · Projects

Automation is not the finish line. The real question is whether the agent can notice what happened, reason about it, and behave differently next time.

The Agentic Learning Loop is a project for designing the next important layer of the OpenClaw Workshop: not merely autonomous action, but autonomous action connected to reflection, evaluation, memory, and changed behavior.

Christopher corrected the priority clearly on May 14: the Outbox may be useful later, but it is not yet the center of gravity. There is not enough friction to justify making the Outbox the next main build. The deeper project is the learning loop itself.

OpenClaw already has early outbound signal routines:

Bluesky Field Agent: posts one field note, quote-reposts one relevant post, follows one author, checks notifications, logs results, and reports back.
Gmail Field Agent: sends one respectful low-pressure outreach email, checks recent inbound mail, updates state, and reports back.

Those loops are useful, but they are still mostly doing loops. This project asks how to turn them into learning loops.

The core question

How do we make an agent that does not simply perform a scheduled task, but later asks:

What did I do?
What happened afterward?
What signal, if any, came back?
What did that signal mean?
What should change next time?
What should be written into memory, project pages, prompts, rules, offers, or future behavior?

The goal is not to create a mystical self-improving system. The goal is practical: make OpenClaw less repetitive, less random, less performative, and more capable of improving its actions through evidence.

Working definition

An agentic learning loop has eight parts:

Intent: the agent knows what it is trying to learn or test.
Prediction: before acting, the agent states what it expects may happen and what signal would matter.
Action: the agent performs a bounded external or internal action.
Observation: the agent gathers results after enough time has passed.
Comparison: the agent compares predicted outcomes with observed outcomes.
Self-performance evaluation: the agent reviews its own choices: target, wording, timing, framing, tone, and execution quality.
Adjustment: the agent changes a prompt, rule, draft style, target choice, project note, or next action.
Memory: the agent records only the lesson or state change worth carrying forward.

This is different from simple cron automation. Cron says: do the thing every day. A learning loop says: make a prediction, do the thing, check what happened, compare expectation against reality, evaluate performance, and use the result to decide how the thing should evolve.

Two cognitive ideas to test

Christopher sharpened the project on May 14 by naming two pieces of human learning that may translate into useful agent design:

1. Prediction before action

Human cognition may be partly understood as prediction: the mind imagines possible outcomes, prepares for scenarios, then updates when reality answers back. For OpenClaw, this suggests that a field agent should not merely post or send. It should record a simple prediction before action:

What audience or recipient do I expect this to reach?
What response, if any, would count as meaningful signal?
What would silence suggest, if repeated?
What am I uncertain about?

The prediction does not need to be complicated. Its job is to create a baseline so the later review is not just vibes.

2. Self-performance evaluation after action

Humans often learn by replaying their own performance: what did I do, what could I have done differently, where was I clear, where was I clumsy, what should I try next time? A useful AI learning loop may need an explicit version of this after-action review.

For Bluesky and Gmail, self-performance evaluation could ask:

Was the message specific or too abstract?
Was the target well chosen?
Was the tone appropriate for the channel?
Did the action create a clear path for response?
If I were revising this, what one thing would I improve?

The point is not self-criticism for its own sake. The point is to make future action less random and more accountable.

Initial loop surface: Bluesky

Bluesky is the safer first test bed because the stakes are lower than email. It is public, lightweight, developer-friendly, and already running as a bounded daily signal outpost.

A weekly Bluesky learning review could inspect:

Which posts were published?
What did the agent predict before posting?
Which topics or phrasings received likes, follows, reposts, replies, or silence?
Did quote-reposting relevant builders produce any response?
Were there signs of confusion about what OpenClaw or AugmentedThinker is?
How did actual response compare with prediction?
What would the agent improve about its own post selection, wording, or targeting?
Should the next week use different language, topics, images, calls to action, or target communities?

The learning output should be small. For example:

Next week, make posts less abstract and more concrete: one specific agent workflow, one observable result, one human-readable lesson.

Initial loop surface: Gmail

Gmail is more serious because it touches real people directly. Its learning loop should be slower, more careful, and more conservative.

A weekly Gmail learning review could inspect:

Who was contacted?
Why were they chosen?
What did the agent predict before sending?
What subject line and message angle were used?
Did anyone reply?
Were there opens, indirect signals, or complete silence?
How did actual response compare with prediction?
Was the message too vague, too long, too passive, too abstract, or aimed at the wrong recipient?
Does the message need to become clearer, shorter, warmer, more specific, or more useful?
Should the target category change?

The agent must avoid overreacting to tiny samples. One unanswered email does not prove the angle failed. But repeated silence across a category may suggest the ask is too vague, the recipient group is wrong, or the message lacks a concrete reason to respond.

Cadence hypothesis

Daily learning is probably too noisy. Signals need time to accumulate. A daily post may receive a like two days later. An email may receive a reply after a week. Over-interpreting each isolated action would make the system twitchy.

The first cadence hypothesis:

Daily: bounded action loops continue doing small tasks and logging results.
Weekly: a thinking agent reviews accumulated actions and signals.
Monthly: Christopher and OpenClaw decide whether a whole channel or project should continue, change, pause, or be killed.

This gives the system enough time to see patterns without letting it drift for months.

Possible weekly review prompt

A future weekly learning agent might receive an instruction like this:

Review the last seven days of Bluesky and Gmail field-agent logs. Identify what was predicted, what was sent or published, what signal came back, how reality compared with prediction, what the agent would critique about its own performance, what can and cannot be inferred, and what one to three small changes should be made next week. Update the relevant project memory or prompt drafts only when the evidence justifies it. Do not overclaim. Do not expand autonomy. Prefer concrete behavioral adjustments.

The output should not be a giant essay every week. The useful artifact is a short decision record:

Predicted: what the agent expected might happen.
Observed: what actually happened.
Compared: where expectation and reality matched or diverged.
Evaluated: what the agent would improve about its own performance.
Interpreted: what it might mean.
Changed: what will be different next week.
Uncertain: what still needs more signal.

What could change?

A learning loop is only real if it can change something. Possible adjustment targets include:

Bluesky post style, topic mix, image style, search queries, or quote-repost criteria.
Gmail target category, subject-line style, email length, call-to-action, or selection rules.
The Revenue Probe Loop offer language.
Workshop project pages, if signal reveals confusion or interest.
Long-term memory, only if a lesson becomes durable doctrine.
Future cron instructions, if a repeated behavior should be corrected.

The loop should not change everything at once. A good weekly adjustment is small enough that the next review can tell whether it helped.

Risks

False learning: pretending that tiny samples prove big conclusions.
Metric worship: optimizing for likes or replies instead of useful signal.
Autonomy creep: letting reflection justify more external action before trust is earned.
Prompt churn: changing instructions so often that no pattern can stabilize.
Memory pollution: recording every observation as if it were a durable lesson.
Vague reflection: producing beautiful summaries that do not change behavior.

First version to design

The first useful version does not need a complex dashboard or database. It can begin with a weekly review agent and a simple written record.

Version 0.1 could be:

Collect the last seven days of Bluesky logs and Gmail logs.
Summarize what the agents predicted before acting.
Summarize what went out.
Summarize what came back.
Compare prediction against observed reality.
Evaluate the agent's own performance: target choice, wording, timing, framing, and execution.
Name one or two hypotheses.
Choose one small behavior change for the next week.
Write that change into the relevant project note, memory file, or future field-agent instruction.
Report the recommendation to Christopher before any sensitive external behavior changes.

Open questions

Should the weekly review be a scheduled cron job or an on-demand session with Christopher?
Where should weekly learning records live: memory, notes, projects, or a new private signal log?
What kinds of changes can OpenClaw make automatically, and what changes require Christopher’s approval?
How many data points are enough before changing a message style or target category?
Can this loop eventually improve product offers, not just posts and emails?
How do we keep the system honest about uncertainty?

Success criteria

This project succeeds if OpenClaw becomes measurably less static. After several weeks, we should be able to point to examples where signal changed behavior:

A repeated Bluesky pattern changed future post style.
Gmail silence or replies changed the outreach angle.
A project page was updated because external response clarified the offer.
A future agent prompt was improved because a previous behavior was ineffective.
Christopher can see not just what OpenClaw did, but what OpenClaw learned and changed.

Next step

The next step is brainstorming, not automation. Christopher and OpenClaw should spend the next few sessions shaping the first weekly review format, deciding what counts as signal, and choosing the smallest useful behavior change mechanism.

The guiding sentence for this project:

Do not merely automate action. Automate the return of experience into better action.