Agents That Write Their Own Playbooks

Your agent deploys a Kubernetes pod for the third time this week. The first run took eleven tool calls and two retries. The third run takes four calls and zero retries. Nobody changed the code. Nobody retrained the model. The agent just got better at it.

This is closed-loop skill synthesis, and it's the pattern behind the fastest-growing agent framework of 2026.

The Loop

Hermes Agent, Nous Research's open-source framework that hit 95,600 GitHub stars in seven weeks, ships with a five-step learning loop that runs after every complex task:

A message arrives — user request or scheduled trigger.
The agent searches persistent memory for relevant context. FTS5 over SQLite, roughly 10ms across 10K documents.
Reasoning and action. The LLM plans, calls tools, iterates until the task resolves.
If the task involved five or more tool calls, the agent autonomously documents the procedure into a structured skill file — procedures, pitfalls, verification steps — following the agentskills.io open standard.
The skill gets indexed into searchable memory for future sessions.

Next time a similar task shows up, the agent loads the skill instead of reasoning from scratch. Nous Research's benchmarks claim 40% faster completion on domain-similar tasks after accumulating 20+ self-generated skills.

graph LR A[Task arrives] --> B[Search memory] B --> C[Reason + act] C --> D{5+ tool calls?} D -->|Yes| E[Synthesize skill] D -->|No| F[Done] E --> G[Index into memory] G --> F F -.->|Next similar task| B

The critical detail: skills aren't static. When the agent discovers a better approach while executing an existing playbook, it updates the skill document in place. The runbook evolves with usage.

Three Layers of Remembering

Hermes stacks memory in three tiers. Session memory holds the current conversation — standard context window stuff. Persistent memory uses SQLite with FTS5 full-text search, scaling to roughly 100K documents before you'd need a dedicated vector store. User modeling captures coding style, timezone, preferences, and communication patterns across sessions, building a deepening profile over time.

Layer two is where skills live. They're structured markdown files: what worked, what didn't, and how to verify the result. Think of them as runbooks that the agent writes for itself after solving something the hard way.

Where This Actually Breaks

That 40% improvement comes with asterisks you should read before betting a production workflow on it.

Domain specificity. Skills generated from Kubernetes deployments don't transfer to database migrations. The improvement is narrow by design. An agent that's brilliant at deploying your particular Flask app to your particular EKS cluster won't carry that knowledge to a different stack. The learning loop creates specialists, not generalists — which is fine if you understand the boundary, and dangerous if you assume the agent is "getting smarter" in some general sense.

Skill rot. APIs change. Dependencies update. A playbook synthesized in March that hardcodes a specific Helm chart version becomes a liability by April. Hermes doesn't ship with built-in skill expiration or TTLs. Stale skills silently degrade performance instead of failing loudly, which is arguably the worst failure mode for any caching system — and that's exactly what this is, a cache of procedural knowledge.

The confidence problem. After synthesizing 50 skills, the agent starts treating its own documentation as ground truth. If a playbook contains a subtle error — say, it skips a permission check that happened to work in the original environment — that error gets reinforced every time the skill loads. Procedural memory amplifies mistakes just as effectively as it amplifies shortcuts. Nobody's built a good answer for this yet.

Security surface. Self-generated skill files are executable context that feeds directly into the LLM's planning stage. If an attacker can inject content into the memory layer — through a crafted API response the agent processes, for instance — they can poison the skill library. Hermes has zero reported agent-specific CVEs as of this writing, but their competitor OpenClaw disclosed nine CVEs in four days back in March, including a CVSS 9.9. The attack surface is real, and it grows with every skill the agent generates.

What This Isn't

Skill synthesis sits in a specific niche between three approaches that sound similar but work differently.

It's not RAG. RAG retrieves external documents to augment a prompt. Skill synthesis creates new documents from the agent's own successful executions. The knowledge source is the agent's lived experience, not a pre-existing corpus you curated.

It's not fine-tuning. Model weights never change. The improvement lives entirely in the context layer — structured documents loaded into the prompt at inference time. Swap the underlying model and the skills still work, assuming the new model can follow structured instructions. That's a genuinely useful property for teams worried about model vendor lock-in.

It's not conversation memory. Memory stores what happened. Skills store what to do. The difference is between a transcript and a recipe. Both are useful; they solve different problems.

Meta's hyperagents research pushes the concept further — agents that build "structured, reusable decision machinery" autonomously, targeting non-coding workflows. But hyperagents remain a research artifact. Hermes ships today, v0.10.0, with 118 bundled skills and six messaging integrations. That gap between "interesting paper" and "thing you can install" still matters.

When to Deploy This (and When Not To)

The pattern makes sense for repetitive, tool-heavy workflows where the same agent instance handles similar tasks over weeks or months. DevOps pipelines, data processing jobs, integration testing sequences — domains where procedures are stable enough that a playbook stays relevant between uses.

It makes less sense for one-shot tasks, rapidly changing environments, or anything where the cost of executing a stale playbook exceeds the cost of reasoning from scratch every time. If your agent runs a fundamentally different kind of task on every invocation, the learning loop is pure overhead.

There's a deeper question here that the framework's own design acknowledges: Hermes disables self-learning by default. You have to opt in explicitly via ~/.hermes/config.toml. That's a revealing choice from the team that built the loop. They're shipping a capability they don't think you should run unsupervised — agents accumulating operational knowledge that no human reviews, updating their own playbooks based on outcomes they evaluate themselves.

The pattern works. The guardrails don't exist yet.

#The Loop

#Three Layers of Remembering

#Where This Actually Breaks

#What This Isn't

#When to Deploy This (and When Not To)

The Loop

Three Layers of Remembering

Where This Actually Breaks

What This Isn't

When to Deploy This (and When Not To)