Self-Improving Pattern

Build a persistent skill library and improve across tasks.

The Self-Improving pattern gives an agent a layered memory system so it gets better the more you use it. It automatically distils successful task completions into reusable skills, retrieves them on similar future tasks, and periodically consolidates its hot memory to keep it bounded in size.

This is the only AgentFlow pattern that learns across runs -- state persists between invocations.

When to Use

Good fit	Poor fit
Long-running AI assistants / coding agents	One-off, single-use tasks
Repeated task patterns (code reviews, data analysis, debugging)	Purely random or exploratory tasks
Cross-session learning -- agent improves as you use it	Privacy-sensitive environments where persisting data is unacceptable
Scenarios where quality compounds over time	Tasks with no reusable procedure (pure Q&A)

Architecture

flowchart TD
    A([START]) --> B["router\n— Classify task & look up skills —"]
    B --> C["executor\n— Run task (optionally guided by skill) —"]
    C --> D["evaluator\n— Score 1–10 & route —"]
    D -- "score≥7 ∧ steps≥3 ∧ no skill used" --> E["skill_extractor\n— Distil reusable skill —"]
    D -- "task_count % N == 0" --> F["memory_updater\n— Consolidate hot memory —"]
    D -- "otherwise" --> G([END])
    E --> F
    E --> G
    F --> G

    style A fill:#2d6a4f,color:#fff
    style B fill:#264653,color:#fff
    style C fill:#264653,color:#fff
    style D fill:#e76f51,color:#fff
    style E fill:#e9c46a,color:#000
    style F fill:#e9c46a,color:#000
    style G fill:#2d6a4f,color:#fff

State flows through the graph:

Field	Type	Description
`task`	`str`	Current task description
`task_type`	`str`	Classified task category (e.g. `code_review`)
`matched_skill`	`dict \\| None`	Retrieved skill document (full content), if any
`execution_steps`	`list[str]`	Steps the executor recorded
`result`	`str`	Final execution result
`evaluation_score`	`float`	Quality score 1–10 from the evaluator
`should_create_skill`	`bool`	Whether the run warrants a new skill
`should_update_memory`	`bool`	Whether the periodic nudge should fire
`task_count`	`int`	Running count of completed tasks

Core Code

from patterns.self_improving import SelfImprovingPattern, MemoryStore

# storage_dir persists across runs -- the agent's skill library lives here.
pattern = SelfImprovingPattern(
    storage_dir="./.my_agent_memory",
    nudge_interval=5,       # consolidate memory every 5 tasks
    skill_threshold=7.0,     # minimum score to extract a skill
    min_steps_for_skill=3,   # skip single-step tasks
)
result = pattern.run("Review this REST endpoint for security issues")
print(f"Score: {result['evaluation_score']}/10")
print(f"Matched skill: {result['matched_skill']}")

Configuration

Parameter	Default	Description
`model`	`"gpt-4o-mini"`	Model name (via `agentflow.utils.get_default_llm`)
`llm`	`None`	Pre-configured LangChain chat model
`storage_dir`	`"./.agent_memory"`	Directory for `MEMORY.json` + `skills/*.json`
`nudge_interval`	`5`	Consolidate hot memory every N tasks
`skill_threshold`	`7.0`	Min score to create a skill (out of 10)
`min_steps_for_skill`	`3`	Min executor steps to warrant skill creation

Quick Start

# 1. Clone and install
git clone https://github.com/your-org/agentflow.git
cd agentflow && uv sync

# 2. Set API key
cp .env.example .env
# Add your OPENAI_API_KEY or DEEPSEEK_API_KEY

# 3. Run the demo
uv run python -m patterns.self_improving.example

The demo runs 5 tasks. By the third code-review task the agent should have built and retrieved a code_review skill -- watch for the Matched skill line to flip from None to the skill name.

How It Works

1. Layered Memory

Hot memory (MEMORY.json) is a small, always-loaded snapshot injected into every system prompt. It holds environment facts, conventions, and lessons learned -- bounded to ~2000 characters so prompt overhead stays flat.

Skill store (skills/*.json) is cold storage. Only the skill index (name + one-line description) is loaded by the router. When the router finds a match, the full skill document is fetched on demand.

2. Progressive Disclosure

The router's prompt costs only a few extra tokens (skill names only). Full skill content is only loaded when the task matches. This mirrors the Hermes Agent's insight that "skill names are cheap, skill bodies are expensive" -- load the expensive parts only when you know they're relevant.

3. Periodic Nudge

Every nudge_interval tasks, the memory_updater node fires. It asks the LLM to consolidate the hot memory: merge duplicates, drop stale entries, and add new durable lessons. This prevents the memory from growing unbounded while keeping actionable knowledge surfaced.

4. Skill Extraction

When a run scores above skill_threshold with at least min_steps_for_skill steps (and didn't reuse an existing skill), the skill_extractor node fires. It generates a structured skill document with procedure, pitfalls, and verification fields -- capturing the method, not just the result.

Comparison with Other Patterns

Dimension	Self-Improving	Reflection	Chain-of-Experts
Learning scope	Cross-task, persistent	Within-task, ephemeral	Across fixed experts, no persistence
Memory	Hot + cold layered	None (state only)	None
Skill reuse	Dynamic, auto-extracted	N/A	Static, pre-defined routing
Best for	Long-running assistants, code agents	Iterative text refinement	Sequential specialist routing
Latency	Medium (extra LLM calls for eval/extraction)	Medium (write-review cycles)	Low (fixed sequence)

Run Tests

uv run pytest patterns/self_improving/tests/ -v

Tests use mocked LLMs and isolated tmp_path storage -- no API key needed.

File Structure

patterns/self_improving/
├── __init__.py
├── pattern.py          # SelfImprovingPattern class + MemoryStore
├── example.py          # Runnable demo (5-task skill library demo)
├── diagram.mmd         # Mermaid source
├── README.md           # This file
├── README_zh.md        # Chinese version
├── py.typed            # PEP 561 marker
└── tests/
    ├── __init__.py
    └── test_self_improving.py   # 20+ mock-LLM tests