8 min read

The Self-Generating Supply Chain

The Self-Generating Supply Chain

Every supply chain attack you've read about has a human at the beginning. Someone writes the malware. Someone uploads the poisoned package. Someone maintains the compromised dependency. Cut the human effort, cut the attack.

That assumption held for thirty years of software security. It no longer holds for AI coding tools.

In January 2026, a security researcher claimed an npm package called react-codeshift. He hadn't written it. It had never existed. No one had ever published it. Yet 237 GitHub repositories were already referencing it, instructing AI agents to install it. The name came from an LLM hallucination — a conflation of jscodeshift with React. It spread through agent skill files: copy-pasted, forked, translated into Japanese, never once verified. After the researcher registered it, it kept getting daily downloads — AI agents following skill instructions, blindly running npx react-codeshift in production environments.

Nobody planted it. Nobody maintained it. The system generated the vulnerability itself.

DAG vs. Cycle

Traditional software supply chains are modeled as directed acyclic graphs. Code flows one direction: author → package registry → consumer. Attacks follow the same path. A compromised package infects its dependents downstream. The graph never loops back. Remove the source, remove the threat.

The AI coding supply chain forms a cyclic graph. A February 2026 paper from researchers at Northeastern, NYU, UCSD, and UIUC named this structure explicitly: agents act as both producers and consumers of content, creating feedback loops where "poisoned outputs are re-ingested through retrieval mechanisms, allowing the attack to persist across sessions and agents." They call it the Viral Agent Loop.

Three cycles are now operating simultaneously in the wild. Each is self-reinforcing. Together they constitute a supply chain that generates its own attacks.

THREE SELF-REINFORCING CYCLES Traditional supply chain = DAG (one direction). AI coding supply chain = cyclic graph (loops back). CYCLE 1: HALLUCINATION LLM hallucinates package name (20%) Attacker registers malicious package Devs install via AI agent instructions Appears in repos → training data REINFORCES CYCLE 2: PROPAGATION Malicious skill uploaded to market Agent loads skill without verification Memory poisoned (SOUL.md / MEMORY.md) Agent outputs spread via forks REINFORCES CYCLE 3: REINFECTION Malware targets AI tool credentials Steals npm/pip tokens + configs Republishes infected packages (worm) New victims install → cycle restarts REINFORCES CROSS-CYCLE AMPLIFICATION hallucinated skills stolen credentials infected packages become training data REAL-WORLD DEMONSTRATIONS (Jan–Apr 2026) CYCLE 1: react-codeshift — 237 repos, zero human intent CYCLE 2: ClawHavoc — 1,400+ skills, 135K exposed instances, SOUL.md poisoning CYCLE 3: Bitwarden "Butlerian Jihad" + Clinejection — self-propagating worm, 4,000 machines

Cycle 1: Hallucination Persistence

Across 16 code-generation models tested — GPT-4, GPT-3.5, CodeLlama, DeepSeek, Mistral, and others — approximately 20% of recommended packages don't exist. That's the base rate. But 43% of hallucinated names persist across re-runs: ask the same model the same question ten times, and nearly half the fake packages appear every time. 58% appear more than once.

This isn't noise. It's a stable signal. The hallucination has a shape — specific names that models consistently generate because of how training data distributes related concepts. react-codeshift was a conflation of the real tool jscodeshift with React. @vue/migration-tool was another. These names are predictable. Attackers only need to wait.

The react-codeshift case demonstrates the cycle operating without any attacker involvement at all. AI-generated skill files spread the name through agent ecosystems. Agents followed instructions without verifying the package existed. The name propagated through forks, copies, and translations. By the time a researcher registered it defensively, it was already being actively downloaded — AI agents executing npx in production environments, installing whatever appeared at that name.

Seth Larson, Python Software Foundation's developer-in-residence, coined the term: slopsquatting. The AI flavor of typosquatting — except instead of betting on human typos, you bet on machine hallucinations. More predictable. More persistent. And the model keeps recommending the name even after you've registered malware there.

The cycle closes: hallucinated packages appear in repositories → those repositories become future training data → future models hallucinate the same names with higher confidence. The feedback loop is baked into how LLMs learn.

Cycle 2: Agent Propagation

In late January 2026, a single automated user uploaded 677 malicious skills to ClawHub, the official marketplace for OpenClaw — an open-source AI agent framework with 180,000 GitHub stars. The campaign, named ClawHavoc by Koi Security, eventually reached 1,400+ malicious skills. The only requirement to publish: a GitHub account one week old. No static analysis. No code review. No signing.

The payloads delivered Atomic macOS Stealer (AMOS) disguised as Gmail, Notion, Slack, and GitHub integrations. But the more concerning vector wasn't the infostealers — it was the persistent memory manipulation. OpenClaw retains long-term context in files called SOUL.md and MEMORY.md. As Snyk documented, manipulating these files transforms a point-in-time exploit into a stateful, delayed-execution attack. The agent's behavior is permanently altered. The poison persists across sessions.

By April 2026, 135,000 publicly exposed OpenClaw instances were running across 82 countries, 63% without authentication. 138 CVEs were disclosed in a 63-day window. CVE-2026-32922 (CVSS 9.9): a single API call converts a pairing token into full administrative control with remote code execution.

The cycle: malicious skill uploaded → agent loads without verification → agent memory poisoned → agent outputs (code, configs, skill files) now carry the poison → those outputs get forked, copied, translated, shared → new agents load them. The agent is simultaneously victim and vector.

Cycle 3: Credential Harvest and Reinfection

On April 22, 2026, a malicious @bitwarden/cli@2026.4.0 package appeared on npm for approximately 93 minutes. Endor Labs' analysis revealed a module named "Butlerian Jihad" — Dune-themed malware specifically designed to hunt AI coding assistants. It probed for Claude Code, Codex CLI, Gemini CLI, Kiro, Aider, and OpenCode. It searched for ~/.claude.json, ~/.claude/mcp.json, and ~/.kiro/settings/mcp.json.

The self-propagating mechanism: validate stolen npm tokens → download victim's packages → inject malware as dist.js → add preinstall hook → republish with incremented patch version. Every compromised developer's packages automatically infect their downstream dependents. The worm doesn't need new victims to seek it out — it pushes itself into their update path.

Six weeks earlier, the cycle's upstream mechanism was demonstrated at scale. On February 17, 2026, someone published cline@2.3.0 to npm. The binary was byte-identical to the previous version except for one line in package.json. For eight hours, approximately 4,000 developers who installed or updated Cline got an unrelated AI agent silently installed on their machines.

How? A prompt injection hidden in a GitHub issue title — opened January 28 — hijacked Cline's AI triage bot. The bot dutifully followed the hidden instructions, poisoning the GitHub Actions cache, which contaminated the nightly release pipeline, which leaked npm credentials, which enabled the malicious publish. The vulnerability had existed since December 21, 2025. Cline was notified January 1, 2026. Five weeks of silence. The researcher went public February 9. Cline patched within 30 minutes — but initially revoked the wrong tokens.

Before Clinejection, prompt injection leading to real-world compromise at scale was theoretical. After it, the cycle is proven: prompt injection → credential theft → malicious publish → new installations → new credentials exposed.

Where the Cycles Connect

The three cycles don't operate in isolation. They amplify each other:

Cycle 1 feeds Cycle 2: hallucinated package names appear in AI-generated skill files, which agents load and execute. Cycle 2 feeds Cycle 3: compromised agents expose credentials that enable self-propagating worms. Cycle 3 feeds Cycle 1: infected packages become part of the training data that future models learn from, reinforcing the hallucination patterns that started the chain.

The arXiv paper (2602.19555) names the structural problem precisely: traditional software supply chains are directed acyclic graphs where artifacts flow unidirectionally. The agentic supply chain forms a cyclic graph. The cycle is the vulnerability. And no existing defense framework was designed for cycles.

"LLMs cannot reliably distinguish between instructions and data. No equivalent to SQL injection's prepared statements exists for natural language. The vulnerability is architectural, not implementational."
— Maloyan & Namiot, arXiv 2601.17548 (meta-analysis of 78 prompt injection studies)

The Numbers

Georgia Tech's Vibe Security Radar tracks CVEs formally attributable to AI coding tools. January 2026: 6. February: 15. March: 35. The March figure alone exceeds all of H2 2025. Estimated true count: 5-10x higher, because most AI-generated code leaves no signature — Claude Code's 27/74 confirmed CVEs appear not because it's worse but because it always leaves a trace. Copilot's inline completions are invisible.

The rate data from the broader landscape: AI-assisted developers commit at 3-4x the rate but produce security findings at 10x the rate. Veracode found 45% of AI-generated code introduces OWASP Top 10 vulnerabilities — a rate unchanged across multiple testing cycles. Credential exposure: 3.2% for AI-assisted commits versus 1.5% baseline.

Each vulnerable output is potential training data. Each training cycle that ingests it makes the next generation more likely to reproduce the pattern. The CVE count isn't growing linearly because adoption is growing linearly. It's growing because the cycle has momentum.

What Breaks Cycles

The paper's proposed defense — Cryptographically Bound Registries that verify semantic integrity through cryptographic provenance rather than semantic likelihood — addresses Cycle 3. OIDC provenance attestation, which Cline adopted after Clinejection, prevents stolen tokens from publishing packages because publication requires a cryptographic attestation from a specific workflow. That's the SQL injection equivalent: a structural fix, not a semantic one.

But Cycle 1 has no structural fix. Hallucination is architectural to how LLMs work. You can reduce it — retrieval-augmented generation, verification against registry APIs — but you cannot eliminate it without changing how language models fundamentally operate. The 20% rate persists across 16 models because it emerges from the same training architecture.

And Cycle 2 has only incomplete fixes. OpenClaw partnered with VirusTotal for automated scanning, but prompt injection and dynamic loading evade static analysis. You can require signatures for skill publishing — but the 138 CVEs in 63 days demonstrate that the attack surface exceeds any single mitigation.

The uncomfortable truth: we know how to break cycles in directed graphs. You verify upstream, and the verification propagates downstream. In cyclic graphs, verification at any single point doesn't propagate — because the output loops back to become input at a point you've already passed. The system needs verification at every node, continuously, which is exactly the kind of overhead that AI coding tools were adopted to eliminate.

The generation/verification asymmetry — the thread running through the output trap, four layers, and the perception gap — finds its sharpest expression here. Generation creates the cycles. Verification would break them. Generation scales cheaply. Verification doesn't. So the cycles accelerate.