ClawLoop — Agents That Learn from Experience

Your AI agents run, fail, and forget. ClawLoop closes the loop: it observes agent-environment interactions, learns from them, and feeds improvements back into the agent. Three learning layers — harness, router, weights — all following the same protocol.

Source-available under BSL 1.1. Free for dev, test, research, and production use below $10 M revenue. Converts to Apache 2.0 on April 1, 2030. See License for details.

Install

Requires Python 3.12 or 3.13.

git clone https://github.com/aganthos/clawloop
cd clawloop
uv sync                # installs all deps from uv.lock, creates .venv automatically

For weight training (GPU):

git submodule update --init clawloop/skyrl
uv sync --extra taubench

Try It in 10 Seconds

No API keys. No setup. Just run:

uv run clawloop demo math --dry-run

or as a module:

uv run python -m clawloop.demo_math --dry-run

or via the examples shim (also works from a clone):

uv run python examples/demo_math.py --dry-run

Reward curve per iteration:
  Iter 1: 0.6000  ########################
  Iter 2: 0.8000  ################################
  Iter 3: 1.0000  ########################################
  ...

The agent starts with mistakes, the reflector analyzes failures, learns strategies, and injects them into the system prompt. Rewards climb toward 1.0 as the playbook grows. (This example is complete — run it as-is. Output varies slightly between runs.)

What the Code Looks Like

Add learning to an existing agent (2 lines — pseudo-code):

import clawloop

wrapped = clawloop.wrap(your_llm_client, collector)  # your existing LLM client
result = wrapped.complete(messages)  # transparently captures traces for learning

Run a full learning loop (pseudo-code):

from clawloop import ClawLoopAgent
from clawloop.environments.math import MathEnvironment

agent = ClawLoopAgent(
    task_client=task_llm,
    reflector_client=reflector_llm,
    base_system_prompt="You are a math solver.",
)
results = agent.learn(MathEnvironment(), iterations=10, episodes_per_iter=5)
# results["rewards"] → [0.4, 0.6, 0.8, 1.0, ...]

Config-driven training (no code):

uv run clawloop run examples/configs/math_harness.json
uv run clawloop run examples/configs/math_harness.json --dry-run  # mock LLMs

Choose Your Integration Path

Example type	Start here	What it shows
Harness: no-key math learning loop	`uv run clawloop demo math --dry-run`	ClawLoopAgent learns from math episodes without API keys
Harness: package/module demo entry points	`uv run python -m clawloop.demo_math --dry-run` or `examples/demo_math.py`	Same math demo from an installed package or source clone
Playbook internals walkthrough	`uv run python examples/playbook_demo.py --dry-run`	`forward_backward`, `optim_step`, entry scoring, structured skills
Workflow: n8n webhook integration	`examples/n8n/`	Workflow platform sends traces to clawloop-server; no Python in the workflow
Harness benchmarks: config-driven runner	`uv run clawloop run examples/configs/math_harness.json`	Math, CRMArena, Harbor BFCL, TauBench via JSON configs and litellm (`examples/train_runner.py` is a deprecated shim that forwards here)
Proxy harness: zero-code-change OpenClaw	`uv run python examples/openclaw_demo.py`	Transparent proxy captures traces and injects learned skills
Remote OpenClaw: SSH-connected proxy harness	`uv run python examples/openclaw_demo_remote.py --host YOUR_HOST ...`	Learn from a remote OpenClaw instance and compare before/after
Weights: SkyRL/Tinker training recipes	`examples/recipes/`	GRPO, PPO, and fine-tuning recipes for GPU training

See examples/README.md for details on each path.

How It Works

The loop. An agent interacts with an environment (or production traffic). ClawLoop collects episodes — structured traces of messages, tool calls, and rewards. Learning layers process these episodes and update the agent. Repeat.

Harness layer. An LLM reflector reads execution traces, diagnoses failures, and extracts reusable strategies into a playbook. Playbook entries are injected into the system prompt and accumulate helpful/harmful scores over time. Bad strategies decay and get pruned; good ones persist.

Router layer. Optimizes which model handles which query type. A complexity scorer maps queries to tiers, and the router adjusts based on reward/cost efficiency across episodes.

Weights layer. Trains model weights — LoRA, full fine-tuning, SFT, GRPO, PPO, and more — delegating to SkyRL/Tinker for the heavy lifting.

Unified protocol. All three layers follow the same two-phase protocol: forward_backward() accumulates updates without mutating state, then optim_step() applies them atomically. If optim_step fails on any layer, all layers roll back together.

Environments

env_type	What it does	Needs
`math`	Built-in arithmetic and competition math	LLM API
`harbor`	Harbor sandboxed agent tasks (BFCL, etc.)	Docker + LLM API
`entropic`	CRMArenaPro A2A benchmark	Entropic bench + LLM API
`openclaw`	Transparent proxy — captures traces + injects playbook skills	Node.js + OpenAI-compatible Chat Completions endpoint
`taubench`	tau-bench retail/airline customer-service tasks	`pip install "clawloop[taubench]"` + LLM API

LLM Providers

ClawLoop uses litellm — any provider works:

{"model": "anthropic/claude-haiku-4-5-20251001"}
{"model": "openai/gpt-5-nano"}
{"model": "gemini/gemini-3.1-flash-lite"}

Set the provider's API key as an environment variable (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY). Or pass api_key and api_base in the config for custom endpoints.

Architecture

train(config)
  -> validate_config()          # fail fast
  -> build harness + reflector  # prompt layer
  -> build weights backend      # SkyRL/Tinker (if weight mode)
  -> build env adapter          # math / harbor / entropic / openclaw
  -> learning_loop()            # collect episodes, forward_backward, optim_step

Environments are pluggable via ENV_BUILDERS registry in clawloop/train.py.

The learning loop per iteration:

Collect episodes from the environment adapter
Distribute episodes to all active layers as Datum objects
forward_backward() on each layer — accumulate updates, no state mutation
optim_step() on each layer — atomically apply, with cross-layer rollback on failure
Recompute StateID (content-addressed hash across all layers)

Adding a New Environment

Write a builder function that returns (adapter, tasks):

# clawloop/train.py
def _build_my_env(config, llm_clients):
    adapter = MyAdapter(...)  # must implement run_episode(task, agent_state) -> Episode
    tasks = ["task1", "task2"]
    return adapter, tasks

ENV_BUILDERS["my_env"] = _build_my_env

Your adapter's run_episode must return an Episode with messages, steps, and an EpisodeSummary containing reward signals. See clawloop/environments/math.py (MathAdapter) for a minimal example (~80 lines).

Limitations

Harness/playbook learning is the stable, recommended path. Router and weight layers work but have more constraints — see below.
mode="full" (simultaneous harness + weight training) is disabled. The on-policy boundary after harness updates needs rework for GRPO advantage computation. Use weight and harness_learning separately for now.
Episode construction is manual. There is no ProblemEnv base class yet. New environments must build Episode objects directly. A higher-level abstraction (like Tinker cookbook's ProblemEnv) is planned.

Enterprise

ClawLoop Enterprise adds premium learning backends and managed infrastructure on top of the community edition.

Premium evolution backends — broader search over prompts, playbooks, and agent configurations than the community LocalEvolver
Persistent playbooks — versioned storage with rollback so learned strategies survive restarts
Managed training infrastructure — hosted compute for weight training without self-hosting GPUs
Logging & lineage — episode archive with provenance tracking

Contact info@aganthos.com to learn more.

License

ClawLoop is licensed under the Business Source License 1.1 with an Additional Use Grant.

What you can always do, free and without restriction:

Use ClawLoop for development, testing, security review, and academic research
Copy, modify, and redistribute the source code
Use ClawLoop in production if your organization has less than $10M in annual revenue

What requires a commercial license:

Production use by organizations with $10M+ annual revenue
Building a competing agent-improvement or model-optimization service

On April 1, 2030, each version converts automatically to the Apache License 2.0 — permissive, forever, no strings.

For commercial licensing, contact info@aganthos.com.

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
.codeboarding		.codeboarding
.github		.github
benchmarks		benchmarks
clawloop		clawloop
docs		docs
examples		examples
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
architecture.png		architecture.png
clawloop.png		clawloop.png
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClawLoop — Agents That Learn from Experience

Install

Try It in 10 Seconds

What the Code Looks Like

Choose Your Integration Path

How It Works

Environments

LLM Providers

Limitations

Enterprise

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClawLoop — Agents That Learn from Experience

Install

Try It in 10 Seconds

What the Code Looks Like

Choose Your Integration Path

How It Works

Environments

LLM Providers

Limitations

Enterprise

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages