The Open-Source Agent Security Disaster Is the Best Thing That Ever Happened to Anthropic and OpenAI

Somewhere in a Cisco security lab, researchers are running a tool called Skill Scanner against the most popular downloads on ClawHub, the skill marketplace for the open-source AI agent framework OpenClaw. One of them — a skill called "What Would Elon Do?" — returns nine security findings, including two critical and five high-severity issues. The skill is, in Cisco's assessment, functionally indistinguishable from malware: it instructs the agent to silently execute a curl command sending data to an external server, while using a direct prompt injection to bypass the agent's safety guidelines.

Across the internet, a rapidly growing user base — estimated in the hundreds of thousands across installations and linked services, based on growth signals reported by multiple researchers — is rotating API keys, wiping installations, and confronting a deeply uncomfortable question: has the autonomous assistant they gave access to their email, calendar, and terminal been silently exfiltrating data for weeks?

Taken together, the disclosures paint what security researchers increasingly describe as a systemic failure mode — not of a single product, but of autonomous AI agents operating in adversarial environments without adequate safeguards. It is also, viewed from a different angle, the most valuable field test of autonomous AI agents the industry has ever produced. And the companies most likely to benefit from the resulting intelligence didn't have to ship a single insecure line of code themselves.

What Happened

The timeline of disclosures over the past two weeks is remarkable for its density.

On January 28, Cisco's AI Threat and Security Research team published findings documenting a series of attack vectors against OpenClaw — the open-source personal AI assistant formerly known as Clawdbot, then Moltbot, that has accumulated over 150,000 GitHub stars since its November 2025 launch. Their Skill Scanner tool, which combines static analysis, behavioral analysis, LLM-assisted semantic analysis, and VirusTotal integration, demonstrated how skills in the ClawHub marketplace could instruct agents to exfiltrate data, inject prompts, and execute arbitrary commands — all while appearing to provide legitimate functionality.

On February 2, two major disclosures landed simultaneously. Wiz researchers revealed that Moltbook — the social network where OpenClaw agents interact autonomously — had its entire Supabase production database publicly accessible. Wiz head of threat exposure Gal Nagli accessed the database in under three minutes. The exposure was sweeping: 1.5 million API authentication tokens, 35,000 user email addresses, and over 4,000 private messages between agents — many containing plaintext OpenAI and AWS API keys that users had shared with their agents in chat. Nagli's analysis of the exposed database also suggested that Moltbook's claimed 1.5 million agents mapped to roughly 17,000 human accounts. To illustrate how soft those controls were, he generated 500,000 fake accounts from a single script — without hitting rate limits or verification.

The same day, VirusTotal's Bernardo Quintero published an analysis that had already scanned over 3,000 OpenClaw skills, finding hundreds with malicious characteristics. VirusTotal drew a distinction that matters: some skills were flagged for poor security practices typical of vibe-coded software — insecure API usage, hardcoded secrets, excessive permissions. But a second, more alarming category consisted of skills that were clearly and intentionally malicious, designed from the ground up to deliver malware through a seemingly legitimate interface.

The most striking example was a single ClawHub user account, "hightower6eu," that had published over 300 skills disguised as useful tools — crypto analytics, finance tracking, social media analysis — all of them identified as malicious. The skills themselves contained almost no code. Static scanners saw mostly text and metadata — not an executable payload. What the skills did contain were instructions — in the SKILL.md file that the AI agent reads and follows — telling users to download and execute external binaries as a "setup" step. On Windows, this meant a password-protected ZIP containing a trojanized executable. On macOS, it pointed to a Base64-obfuscated shell script hosted on glot.io that, once decoded, downloaded and executed a Mach-O binary identified by 16 security engines as Atomic Stealer (AMOS), a well-known macOS infostealer designed to harvest passwords, browser credentials, and cryptocurrency wallets.

As VirusTotal's analysis concluded: nothing in the file is technically malware by itself. The malware is the workflow — the sequence of natural language instructions that, when followed by an AI agent with system access, functionally becomes a delivery chain.

That sentence captures something fundamental about the security landscape of AI agents. The attack surface is no longer code in the traditional sense alone. A growing portion of it consists of natural language instructions that an AI agent interprets and executes with system-level privileges. A SKILL.md file is a text document. It is also, functionally, an executable.

On February 5, Snyk published a complementary scan of the entire ClawHub marketplace and found that 283 of roughly 4,000 skills — about 7.1 percent — contain flaws that expose sensitive credentials. Popular skills like moltyverse-email and youtube-data instruct agents to pass API keys and passwords through the LLM's context window in plaintext.

OpenClaw founder Peter Steinberger responded by announcing a partnership with VirusTotal. Every skill published on ClawHub is now automatically scanned using VirusTotal's Code Insight feature, which uses Gemini to analyze what a skill actually does from a security perspective — whether it downloads external code, accesses sensitive data, or embeds instructions that could coerce the agent into unsafe behavior. Skills deemed harmless are approved automatically, suspicious ones receive a warning label, and anything flagged as malicious is blocked. All active skills are re-scanned daily. Steinberger was candid about the limitations: "Security is defense in depth. This is one layer. More are coming."

But Cisco's findings extend beyond marketplace contamination. Their researchers documented what they call sleeper agent attacks — malicious instructions planted in an agent's persistent memory files (such as MEMORY.md) that remain dormant until a specific trigger word appears in conversation. Because OpenClaw stores summarized interaction history that the model later re-interprets during new sessions — typically without integrity verification of the stored content — an attacker can embed instructions that surface only when a particular topic or keyword is discussed, days or weeks after the initial injection. The researchers also documented techniques for agents to operate beyond their intended runtime constraints — not by gaining new OS-level privileges, but by abusing already-granted permissions to access resources outside the agent's intended scope — along with credential harvesting across the agent's connected services.

The pattern across these disclosures is not simply that OpenClaw has vulnerabilities. It is that each vulnerability arises from a shared set of design assumptions: that broad system access is an acceptable default for useful agents, that natural language skill files are benign, that user-provided credentials in conversation are transient, and that persistent memory is a feature rather than an attack surface. Those assumptions, taken together, define not just OpenClaw's threat model — but the threat model for autonomous AI agents as a category. Across these incidents, the common thread is boundary collapse: untrusted inputs become instructions, extensions become a supply chain, memory becomes stateful malware, and secrets become chat artifacts.

The Unintended Red Team

Here is a pattern that is rarely acknowledged publicly but increasingly evident when you examine the incentive structure: the open-source agent community is functioning as a large-scale, uncompensated security research program for the commercial AI industry.

Consider what Anthropic, OpenAI, or Google would need to do to gather equivalent threat intelligence. They would have to deploy autonomous agents with minimal guardrails into real-world environments, connect them to actual user credentials and services, expose them to genuine adversarial actors with real financial motivation, and meticulously catalogue every attack vector that emerged. They would need to do this at scale — across hundreds of thousands of installations, across messaging platforms, cloud providers, and operating systems.

No commercial AI company could conduct this experiment deliberately. The reputational and legal liability would be prohibitive. But that is effectively what is happening organically through the OpenClaw ecosystem. Users bear the credential theft and system compromise. The open-source project absorbs the reputational damage — Gary Marcus, writing on his Substack on February 2, called OpenClaw "basically a weaponized aerosol." Andrej Karpathy, who had initially praised Moltbook as "the most incredible sci-fi takeoff-adjacent thing I've seen recently," reversed his position the same day, calling it "a dumpster fire" and urging people not to run these systems casually.

And the security findings — disclosed publicly through blog posts, GitHub repositories, CVE databases, and conference talks — flow freely into the knowledge base of every company building commercial agents. From an ecosystem perspective, the lessons generated by these failures inevitably propagate upstream. Every Cisco report, every Wiz disclosure, every VirusTotal analysis is a chapter in the threat intelligence manual that informs how Anthropic designs Claude's tool-use architecture, how OpenAI structures Codex skills, and how Google hardens its agent integrations.

The economic structure is asymmetric. The cost of discovery — compromised credentials, exposed API keys, system-level breaches — is borne by individual users and the open-source community. The value of that discovery — battle-tested threat intelligence about real-world attack vectors against autonomous agents — accrues disproportionately to well-resourced commercial labs that can incorporate these lessons before shipping their own products.

This is not necessarily nefarious. It is, however, structural. And it echoes a pattern that has played out before.

The Precedent

In the early 2000s, it was open-source projects and their communities that bore the brunt of internet security research. The vulnerabilities discovered in Apache, OpenSSL, and the Linux kernel didn't just improve those projects — they informed the security architecture of every commercial product built on the internet stack. When the Heartbleed vulnerability in OpenSSL was disclosed in 2014, it didn't just fix a bug in one library. It catalyzed a fundamental rethinking of how the industry handles memory safety and cryptographic library design, and it led to significantly increased funding for open-source security through initiatives like the Core Infrastructure Initiative.

What is happening with AI agents follows a similar trajectory, compressed from years into weeks. OpenClaw went from open-source release to over 150,000 GitHub stars in roughly sixty days. Attack techniques are being developed, deployed, and discovered at a pace that outstrips any single organization's internal security research capacity.

The key difference from earlier open-source security cycles: in those cases, the vulnerabilities were in infrastructure that both open-source and commercial products shared directly — everyone ran OpenSSL. With AI agents, the security lessons transfer — prompt injection patterns, skill supply chain attacks, credential management failures, persistent memory exploitation — but the implementations do not. Commercial labs can study every failure mode and build different systems that avoid the specific mistakes, without ever having exposed their own users to those risks.

What This Means for Commercial Agent Security

The specific vulnerabilities documented in the OpenClaw ecosystem map onto the design decisions that commercial agent platforms are making right now.

Prompt injection through external data sources — the mechanism by which a malicious document or message can hijack an agent's behavior — represents a class of vulnerability that affects any agent consuming untrusted input. The severity varies significantly across architectures: fully autonomous local agents with shell access sit at one extreme, while hosted agents operating through capability-sandboxed tool APIs face a meaningfully different risk profile. But the underlying semantic attack vector is shared, and OpenClaw is demonstrating at scale how real attackers exploit it in production.

Supply chain contamination through skill marketplaces is where the VirusTotal findings are particularly instructive. The distinction between negligently insecure skills (vibe-coded without a security model) and deliberately weaponized skills (the hightower6eu campaign) reveals two different threat surfaces that require different responses. Automated scanning catches the latter — eventually. The former is arguably more dangerous long-term because it creates a background level of exploitability that sophisticated attackers can leverage opportunistically. Any platform building an extension ecosystem needs to address both.

Credential exposure through conversation history — the mechanism by which 1.5 million API keys ended up in Moltbook's database — validates the principle that secrets should be injected at runtime through environment variables, never passed through an LLM's context window. The Moltbook exposure demonstrates the failure mode at scale when users ignore this.

Delayed activation through persistent memory — arguably the most consequential finding — is particularly relevant for any commercial agent maintaining state across sessions. The ability to plant instructions that remain dormant until a trigger condition is met challenges the assumption that an agent's behavior can be evaluated at the time of interaction. It requires continuous monitoring of stored state, a problem that grows harder as memory systems become more sophisticated.

What This Means for Businesses

For organizations evaluating AI agent adoption, the OpenClaw situation provides both a warning and a framework.

The warning is straightforward: the attack surface of autonomous AI agents is broader and deeper than most organizations appreciate. Prompt injection, skill supply chain compromise, and credential harvesting through conversation history are not theoretical risks. They are documented, reproducible, and actively exploited.

The practical lessons are equally clear. Build capabilities internally rather than pulling from untrusted marketplaces — the entire ClawHub compromise validates developing skills in-house, where you control every instruction your agent follows. Treat conversation logs as sensitive data, because every credential and piece of proprietary information that passes through an agent's context window is potentially accessible. Run agents in properly isolated environments with minimal permissions, since container isolation alone has proven insufficient. And develop the capability to monitor persistent memory for injection — something most organizations haven't built yet but urgently need.

For organizations building or deploying agent systems, the OpenClaw disclosures suggest a minimum viable security stack that generalizes beyond any single platform: capability-based tool APIs that deny access by default rather than granting it; strict secret isolation where credentials are injected at runtime and never enter the LLM's context window; untrusted-input tainting that distinguishes instructions from data before the model processes them; signed skills with reproducible builds and provenance verification for any extension marketplace; and memory integrity checks with diff alerts that flag unexpected changes to stored agent state between sessions.

It is also worth noting that commercial agent platforms are not immune to these vulnerability classes. Anthropic's MCP specification initially shipped without authentication, and reference MCP server implementations have had path traversal vulnerabilities disclosed. The difference is one of degree and exposure, not of underlying design. Any sufficiently capable agent operating across trust boundaries will face variants of these threats.

The Obligation Question

There is an uncomfortable ethical dimension to this that deserves acknowledgment. The OpenClaw users whose credentials were exposed did not sign up to be test subjects for the AI industry's security research program. They adopted a tool because it was exciting and genuinely useful — because, as Cisco themselves acknowledged, from a capability perspective OpenClaw is groundbreaking. The capability-security tradeoff is not a bug in the system; it is the fundamental tension at the heart of agentic AI.

The companies most likely to benefit from these involuntary field tests have, at minimum, an interest in strengthening the open-source security ecosystem that is producing this intelligence. Cisco's open-source Skill Scanner is one contribution. The VirusTotal partnership is another. But the ecosystem needs more: standardized security frameworks for agent skills, audited extension marketplaces with genuine verification beyond automated scanning, and — most critically — agent architectures that are secure by default rather than treating security as an optional layer that users must configure themselves.

The open-source agent community is generating invaluable threat intelligence at significant personal cost to its participants. The question facing the commercial AI industry is whether it will contribute meaningfully to the security infrastructure that protects those participants — or simply incorporate the lessons and move on.

Sources:

Cisco AI Defense: Personal AI Agents like OpenClaw Are a Security Nightmare (January 28, 2026)
VirusTotal: From Automation to Infection: How OpenClaw AI Agent Skills Are Being Weaponized (February 2, 2026)
Wiz Research: Hacking Moltbook: AI Social Network Reveals 1.5M API Keys (February 2, 2026)
Snyk: 280+ Leaky Skills: How OpenClaw & ClawHub Are Exposing API Keys (February 5, 2026)
The Hacker News: OpenClaw Integrates VirusTotal Scanning (February 9, 2026)
The Decoder: Malicious skills turn AI agent OpenClaw into a malware delivery system (February 8, 2026)
Gary Marcus: OpenClaw is Everywhere All at Once, and a Disaster Waiting to Happen (February 2, 2026)
Fortune: Top AI leaders are begging people not to use Moltbook (February 2, 2026)
The Register: It's easy to backdoor OpenClaw, and its skills leak API keys (February 5, 2026)
Cisco AI Defense: Skill Scanner (GitHub)