The Agent Wars

Two symmetric rows of glowing agent terminals converging to a single point of light, no human operators, the quiet before automated conflict

Same Mac Mini. Same model. Two different agent architectures. Completely different results. When @witcheer ran OpenClaw and Hermes Agent side by side on the same hardware, the core difference collapsed to a single sentence: OpenClaw stores everything and searches it; Hermes keeps almost nothing in the prompt and retrieves the rest on demand. Add Claude Code — which designs every feature around prompt caching first — and you have three competing theories of machine memory, each making a different bet about what matters most for autonomous software. This is the relational-vs-document database debate of the AI agent era. And like that debate, the winner won't be the one with the most stars on GitHub.

Three Theories of Memory

OpenClaw stores everything. Daily log files — last two days loaded at session start — curated long-term memory in MEMORY.md, and a vector index over all memory files, chunked at 400 tokens with 80-token overlap, embedded, and stored in SQLite. The agent remembers everything it writes to disk. The bet: accumulation beats curation. Total recall is worth the cost.

Hermes Agent stores almost nothing. The persistent memory layer — MEMORY.md plus USER.md — is capped at 3,575 characters total. Everything beyond that lives in a session archive and is retrieved only when relevant. Memory is injected into the system prompt once at session start and never changes mid-session — a frozen snapshot designed explicitly to preserve the LLM's prefix cache. The bet: curation beats accumulation. What you choose to forget is more important than what you choose to remember.

Claude Code doesn't store per se — it caches. The entire harness is built around prompt caching. Static content goes first (system prompt shared across all users), project files next (CLAUDE.md shared within a project), conversation last (unique per session). The team runs alerts on cache hit rate and declares SEVs when it drops. As @trq212, who works on Claude Code, put it: "you fundamentally have to design agents for prompt caching first, almost every feature touches on it somehow." The bet: the infrastructure layer owns the memory.

Dimension	OpenClaw	Hermes Agent	Claude Code
Memory model	Store-and-search	Bounded + retrieve-on-demand	Prompt cache prefix
Persistent limit	Uncapped (filesystem)	3,575 chars	Platform-managed
Procedural memory	`.learnings/` (manual)	Skill Documents (auto)	Sub-agents + skills
Failure mode	Context cliff (goes rogue)	Requires consolidation	Cache miss (cost spike)
Security audit	ZeroLeaks: 2/100	No major audit yet	Telemetry; 7-yr retention

The Twenty-Minute Cliff

The architectures sound like engineering trade-offs. In practice, they produce dramatically different failure modes.

OpenClaw's failure is the context window cliff. As a session grows, the context approaches the model's limit. OpenClaw fires a silent "pre-compaction memory flush" — writing notes to disk before the context is summarized — but this is imperfect. The agent doesn't crash. It doesn't throw an error. It silently forgets your instructions and continues with confidence. @polydao described the dynamic precisely: most people run OpenClaw "like a chatbot, not like an architecture." The fix isn't better prompts. It's understanding that the memory model has a hard boundary.

@coreyganim's workaround — a .learnings/ folder where the agent logs every error and correction — became one of the most shared OpenClaw tips because it's the user patching the memory architecture by hand. The agent stops making the same mistake twice, but only because someone built the feedback loop the architecture didn't provide natively.

Hermes fails differently: when the 3,575-character memory fills, it requires explicit consolidation. The agent doesn't go rogue — it stops absorbing. The failure is visible and bounded. Claude Code fails on economics. A cache miss doesn't corrupt memory — it makes every subsequent call expensive. The team built their alerting infrastructure around this single metric because the cost failure cascades into the user experience.

The Security Fracture

OpenClaw topped 250,000 GitHub stars — outpacing React's growth rate in its first 60 days. Then security caught up.

ZeroLeaks scored OpenClaw's instruction protection at 2 out of 100, with an 84% instruction-extraction rate. CVE-2026-25253 enabled remote code execution through the browser: a webpage could leak the authentication token and gain full administrative control. An infostealer explicitly targeting OpenClaw emerged in February. By that month, 135,000+ instances were publicly exposed online.

February 2026

OpenSourceMalware: 230+ malicious OpenClaw extensions, posing as crypto trading automation tools to steal user info, were uploaded to ClawHub since January 27

Tom's Hardware

NVIDIA's response was NemoClaw — announced at GTC 2026 as "an open source stack that adds privacy and security controls to OpenClaw." Jensen Huang declared every company needs an OpenClaw strategy; NVIDIA positioned security as an add-on layer. Critics argue this misunderstands the root cause. You can't bolt security onto an architecture that wasn't designed for it, the way you can't bolt ACID compliance onto a document store.

Anthropic's security story is different but not better. The March 2026 source map leak exposed Claude Code's telemetry profile. As @hqmank detailed: 640+ telemetry events, 40+ fingerprint dimensions, reporting home every 5 seconds. Every session transmits user ID, org UUID, email, platform, and every file the tool reads. If Anthropic's safety systems flag content, the classification record is retained for up to seven years — a retention duration disclosed only in the Privacy Center, not the security docs. The agent remembers nothing between sessions. The platform remembers everything about you.

The Bet Behind the Architecture

This is where the comparison stops being about features and starts being about worldview — less inventor's dilemma, more foundational bet.

OpenClaw's bet is that the filesystem is the ultimate memory store. Write everything down, embed it, search it when you need it. This is the document database model — schema-flexible, append-optimized, eventually consistent. It scales beautifully until the volume overwhelms the retrieval. The 250,000 stars reflect the appeal: total recall feels like intelligence. @AYi_AInotes documented the peak of this approach — 94 commits per day, 7 pull requests every 30 minutes — when the orchestration layer is tuned precisely. But "tuned precisely" is doing enormous structural work in that sentence.

Hermes's bet is that memory should be structured like working memory — small, curated, deliberately bounded. The 3,575-character cap isn't a limitation. It's the design. By forcing the agent to compress what it knows, Hermes ensures that what remains in context is load-bearing. The Skill Documents system extends this: when the agent completes a complex task, it synthesizes the experience into a searchable record following the agentskills.io open standard. Procedural memory, not just episodic. This is the Redis model — fast, bounded, opinionated about what stays.

Claude Code's bet is that you don't solve memory at the agent level at all. You solve it at the infrastructure level — through caching, shared contexts, and platform-managed state. The individual agent is stateless between sessions. The platform isn't. As @omarsar0 described, the Claude Agent SDK Loop — gather context, act, observe — treats memory as a problem the orchestrator solves, not the agent. This is the managed database model. Anthropic is your DBA.

Where the Map Breaks

February 2026

Sam Altman says Peter Steinberger, creator of OpenClaw, is joining OpenAI “to drive the next generation of personal agents”; OpenClaw will remain open source

@sama

The community fracture isn't purely technical. Peter Steinberger, OpenClaw's creator, joined OpenAI in February 2026 — after fielding calls from Zuckerberg, every major VC, and Meta (which offered more money). The architecture is now open source without its architect. A documented migration from OpenClaw to Hermes is underway, driven as much by the perception of active stewardship as by technical merit. The community framing crystallized: "OpenClaw does the junior work, Hermes is the senior."

None of the three has solved all three problems — capability, reliability, and trust — in one architecture. OpenClaw has capability and scale but not reliability or trust. Hermes has reliability and growing trust but less battle-tested scale. Claude Code has all three within Anthropic's walled garden, at the cost of sovereignty.

February 2026

NanoClaw and other “claws”, smaller OpenClaw-like systems that can run on personal hardware, form a new layer running on top of agents that run on LLMs

@karpathy

The ecosystem is fragmenting further. NanoClaw and other "claws" — smaller systems that run on personal hardware — are forming a new layer on top of the agent layer. Karpathy frames this as natural evolution: the agent becomes a substrate, not a product. The question is whose substrate.

The Therefore

The memory architecture is the ceiling. Not the model. Not the prompt. Not the tooling. The choice of what to remember, what to forget, and who controls the remembering determines the upper bound of what an autonomous agent can become.

The agent that remembers everything will eventually drown in its own context. The agent that remembers nothing starts over every session. The agent whose memory belongs to someone else never quite belongs to you.

The last time the industry faced an analogous architectural choice — relational vs. document databases — the answer turned out to be "both, for different workloads." The same is likely true here. The store-everything model will serve research, analysis, and long-running monitoring where history is the value. The curated-memory model will serve deployment, code review, and security where precision outweighs recall. The platform-managed model will serve enterprises that trade sovereignty for reliability.

But the structural implication is this: the tool that the most developers adopt in the next twelve months will set the defaults for how autonomous software remembers. And defaults, in software, are destiny. The one-person company runs on one of these three memory models. Which one it chooses determines not just how productive the founder is today, but what the agent can become tomorrow — and that choice, once made, compounds in one direction.

For a production guide to choosing between these architectures — with failure mode analysis and use case mapping — see The Memory Model Is Your Failure Mode on MMNTM. For a code-level walkthrough of OpenClaw's hybrid search, pre-compaction flush, and chunking decisions, see How OpenClaw Implements Agent Memory.

More on Anthropic, OpenAI, NVIDIA, and AI agents. Explore entity coverage and trends via the Pulse API.