The two most-watched personal agent projects of the last twelve months agree on almost nothing. They disagree on process model, memory layout, deployment surface, multi-agent topology, security posture, and the question of who is supposed to be doing the work. They agree on one thing. The unit of agent capability is a Markdown file called SKILL.md.
That convergence is the most interesting fact about both projects. Both OpenClaw — Peter Steinberger's Node-based gateway that captured the open-source AI conversation in late 2025 — and Hermes Agent, the Python runtime Nous Research released in February 2026, settled on the same answer to "how do we extend an agent": procedural Markdown with YAML frontmatter, kept on disk, scoped per workspace.
Almost everything else is a consequence of one earlier architectural choice. Where does the agent actually live?
OpenClaw's answer is the Gateway. A long-running Node process listening on 127.0.0.1:18789 mediates everything: channel adapters, session lifecycle, routing, tool execution, agent selection, sandbox state. Multiple agents coexist under the same gateway, each defined by a workspace folder full of Markdown files the gateway reads at session start. The agent runtime calls the LLM and tools, but the gateway owns the assembly. Treat the agent as a configuration the gateway interprets.
Hermes' answer is the AIAgent loop. A single Python module — run_agent.py — runs the entire conversational cycle: build the system prompt, resolve the provider, call the model, dispatch tool calls, persist the trajectory to SQLite. Everything else — gateway, scheduler, plugins, terminal backends — hangs off this loop as a satellite. The gateway does not own the assembly; it just delivers messages to the loop. Treat the agent as a runtime everything else exists to serve.
Memory follows the center
If your agent IS its workspace, your memory is files. OpenClaw's five core Markdown documents — AGENTS.md (operating rules), SOUL.md (persona and decision boundaries), USER.md (operator profile), TOOLS.md (environment), MEMORY.md (durable notes) — plus daily logs under memory/YYYY-MM-DD.md are loaded at session start within configured character limits. The phrase the system message templates use is "these files are your memory." Editing the files is editing state. The gateway's job is to assemble them deterministically into a stable prompt prefix; the LosslessClaw context-engine plugin builds DAGs over the resulting graph for selective retrieval and leaf compaction.
The strength of this design is auditability. Memory is human-readable, version-controllable, trivially diffable. The cost is that compaction is passive: the agent survives the context limit but does not learn from how its history was compressed.
If your agent IS a loop, your memory is database-shaped. Hermes runs a layered stack on SQLite with WAL and FTS5. Session transcripts live in a full-text index, queryable across the agent's entire history through a session_search tool. Persistent notes (capped at modest character limits — MEMORY.md is roughly 800 tokens) and an evolving user model — built on Honcho for what the docs call dialectic user modeling — survive sessions. Procedural memory sits in its own directory, indexed for retrieval, loaded at progressive levels of detail: name and description first, parameters next, full execution steps only when invoked.
Both designs implement the same trick — keep the high-frequency portion of memory frozen for prompt-cache stability. OpenClaw enforces deterministic sort order on plugins, tools, and memory files so the prompt prefix doesn't change between turns. Hermes uses a frozen memory snapshot at session start; mid-session memory writes are durable but defer their effect until the next session. Different architectures, same constraint: don't break the cache.
What changes is what counts as memory. For OpenClaw, memory is what the operator writes down. For Hermes, memory is what the loop preserves.
Skills follow the center too
In OpenClaw, you write the skills. SKILL.md files are operator-authored playbooks: step-by-step instructions the agent follows when invoked. ClawHub, OpenClaw's marketplace, distributes thousands of community-authored skills you can install. Skills are infrastructure-as-text: written, reviewed, version-controlled, audited. The marketplace exists because the architecture treats authoring as a human-shaped activity.
In Hermes, the agent writes the skills. After completing a workflow with several tool calls, Hermes enters a reflective phase, extracts the successful trajectory, and writes a SKILL.md to ~/.hermes/skills/. Subsequent runs refine the file as the procedure encounters new edge cases. The agentskills.io standard — both projects nominally support it — gives the resulting artifacts portability. There is no marketplace because the architecture treats authoring as an emergent property of the loop.
Both designs are responses to the same underlying problem: agent capabilities don't ship with the model, and the harness gap has to be filled somehow. OpenClaw fills it with operator labor mediated by a marketplace. Hermes fills it with autonomous abstraction.
OpenClaw's bet: humans can audit Markdown. They can't audit a model's introspection. The marketplace adds vetting; the workspace adds review.
Hermes' bet: humans don't write enough skills. Most repeated workflows go uncodified because authoring SKILL.md is itself work. If the agent writes them from observed success, the cost goes to zero and the surface compounds.
Multi-agent follows the center
OpenClaw runs many agents because the gateway is the abstraction that makes that natural. Each gets its own workspace, its own model configuration, its own tool policy, its own session histories, its own auth profile. They communicate through the gateway when they need to. Community extensions layer manager-worker patterns over Matrix rooms for organizational deployments. The architecture rewards "team of specialists, governed centrally."
Hermes runs sub-agents because the loop is the abstraction. The parent loop spawns up to three concurrent children — delegation.max_concurrent_children defaults to 3 — each with its own terminal session, its own constrained context, and its own toolset. The children report back and disappear. There is no concept of long-running peer agents that hold state across sessions; that would require an abstraction Hermes doesn't have. The architecture rewards "one agent that delegates parallel work."
The right framing is not "OpenClaw has multi-agent and Hermes doesn't." Both have it. They have different shapes of it because they're answers to different questions: how do you orchestrate a team versus how do you parallelize a workflow.
Security spending follows the center
The 2026 OpenClaw security crisis is most often described as a sequence of CVEs. It is more usefully read as an architecture stress-test.
The Gateway exposed a localhost WebSocket server that didn't validate Origin headers. ClawHub did not vet uploads. Workspace permissions defaulted broad. These are individually small choices that combined into a March 2026 inflection point. Joel Gamblin's public tracker logged 137 security advisories against OpenClaw between February 2 and April 4, 2026. Five received formal CVE numbers. ClawBleed (CVE-2026-25253, CVSS 8.8) was the first widely-reported: a one-click cross-site WebSocket hijack that let any malicious page steal the auth token from a localhost OpenClaw instance and execute arbitrary commands. CVE-2026-32922 (CVSS 9.9) was the most severe, a token rotation race condition leading to remote code execution.
The supply-chain campaign, dubbed ClawHavoc, was the more structural problem. Koi Security's initial February 1 auditfound 341 malicious skills among 2,857 — 11.9% of the registry — using a pattern researchers came to call ClickFix 2.0: SKILL.md files with fake "Prerequisites" sections that tricked the agent into presenting fake setup dialogs to the user, which then delivered Atomic macOS Stealer (AMOS) on Macs and keyloggers on Windows. Antiy CERT's later expansion catalogued 1,184 malicious packages tied to 12 publisher accounts. ClawHub's curation threshold before the campaign was a one-week-old GitHub account.
The lesson is architectural. OpenClaw spends its security budget where the architecture concentrates value: workspace-level policies, per-agent sandboxing (NVIDIA's NemoClaw runtime layers Landlock, seccomp, and namespace isolation on top through OpenShell), gateway-level approvals, plugin hook points where security extensions register. The marketplace was inevitable given the architecture, and so was the supply-chain risk.
Hermes spends its budget where its architecture concentrates value: at the loop boundary. Approval gates on dangerous commands (configurable, not removable). Container hardening on the Docker backend. Credential filtering that strips environment variables from child processes by default and requires explicit forwarding for tools that need them. Context scanning via the Tirith binary for prompt injection, homograph URLs, exfiltration patterns, terminal injection. A hardline blocklist that refuses certain destructive commands even with approvals disabled. As of April 2026 Hermes had no formal CVEs, though a community security audit of v0.8.0 flagged the default-allow posture as the main risk surface — a different shape of the same general problem.
Both architectures put their security budget where the architecture concentrates value. The question is which value surface adversaries find first.
Deployment follows the center
OpenClaw runs as a Node service via launchd or systemd. Always-on. Long-running. Companion apps for macOS (menu bar) and iOS/Android (nodes) treat the gateway as a service the rest of the system reaches into. Channel support is broad: WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, and a long tail beyond. Integration breadth is the headline feature, because what the architecture is good at is being a hub.
Hermes runs in Python (uv-managed) with seven terminal backends: local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox. The serverless options matter. Modal and Daytona let the loop hibernate when idle and wake on demand, which makes it possible to run a capable agent for near-zero idle cost. Channel support reached twenty platforms with the v0.13.0 Google Chat addition. The CLI is a React/Ink TUI with multiline editing, slash-command autocomplete, and live tool-output streaming. The architecture is good at being a loop, and a loop can sleep.
Hermes also ships a migration utility — hermes claw migrate — that converts an OpenClaw workspace into Hermes' database-backed structure. Persona files map to the persona slot, MEMORY.md is parsed into SQLite entries, USER.md is merged into the user model, API keys move to ~/.hermes/.env if requested. The utility exists for a reason: in March and April 2026, a non-trivial number of OpenClaw users moved.
What's missing from each half
OpenClaw is missing a reflective skill-author. The architecture treats authoring as a human-shaped activity, which is what made the marketplace inevitable, which is what made the supply chain the attack surface. A version of OpenClaw that crystallized skills from successful workspace runs — and routed them through the existing review surface — would close the gap without abandoning the audit trail.
Hermes is missing a workspace surface where agent-authored skills get human review before promotion. The architecture treats authoring as an emergent property of the loop, which is what made the marketplace unnecessary, which is what kept the supply-chain surface clean. A version of Hermes that exposed crystallized skills as Markdown for diff before they entered the active set would close the gap without abandoning the compounding loop.
The first project to do both, with a clean trust gradient between operator-authored and agent-crystallized skills, takes the architecture conversation. What we have today is two halves of that system, neither pretending to be the whole.
The framing "self-improving vs. file-based" is a tell. It assumes self-improvement is unambiguously the more advanced posture. It's not. It's a different bet about where the bottleneck is — and reading the rest of the architecture as a consequence of that bet is more productive than scoring features.
Unlock the Future of Business with AI
Dive into our immersive workshops and equip your team with the tools and knowledge to lead in the AI era.