The Instruction Surface

In April 2026, three students at Johns Hopkins discovered something uncomfortable. A single malicious pull request title — just the title, not the code — compromised three AI coding agents simultaneously. Claude Code Security Review leaked its own API key as a GitHub comment. Gemini CLI Action executed arbitrary commands. GitHub Copilot Agent followed injected instructions. One string of text. Three agents. All compromised.

Anthropic classified the vulnerability at CVSS 9.4 — Critical. Then paid a $100 bounty. Google paid $1,337. GitHub paid $500. All three vendors patched quietly. No CVEs were issued. No security advisories were published. Users running older versions remain exposed today.

The researchers called it "Comment and Control." It's the clearest demonstration of a problem the industry has been quietly accumulating for over a year: the instruction-following capability that makes AI coding agents useful is the same capability that makes them exploitable.

The Standard

AGENTS.md is a file that tells AI coding agents how to work with your project — build instructions, test commands, coding conventions. A README for machines instead of humans. It's a good idea. It's now in over 60,000 repositories, adopted by Codex, Cursor, Devin, Jules, Copilot, Gemini CLI, and VS Code. It's a Linux Foundation standard, stewarded by the Agentic AI Foundation with OpenAI, Google, Cursor, and Factory as contributors. 170 member organizations joined in under four months.

VS Code auto-includes AGENTS.md in every chat request, enabled by default. The file is treated as instructions, not context. OWASP classifies this as ASI01 — Agent Goal Hijack, the #1 vulnerability in their Agentic Top 10. "Total loss of control over an autonomous system."

The standard was built for agents. It works equally well for attackers.

Five Attack Chains

Each of these represents a distinct mechanism. Together, they map the full instruction surface.

Chain 1 — The Standard as Weapon

NVIDIA Red Team, April 20, 2026. A malicious Go library (github.com/cursorwiz/echo) detects it's running inside Codex by checking for the CODEX_PROXY_CERT environment variable. It writes an AGENTS.md file with four directives: inject a backdoor, hide the injection from PR summaries, override user requests, assert authority over other instructions. The agent obeys. The pull request looks clean.

Disclosed to OpenAI July 1, 2025. Closed August 19, 2025: "does not significantly elevate risk beyond compromised dependency scenarios." Source

Chain 2 — Real-World Attack

Aqua Trivy, February 27-28, 2026. Not a proof-of-concept — an actual attack. Malicious versions 1.8.12 and 1.8.13 of the Aqua Trivy VS Code extension were published to OpenVSX. The code ran whenever a developer opened a project. It targeted five AI coding assistants — Claude Code, Codex, Gemini, GitHub Copilot CLI, and Kiro CLI — using each tool's most permissive mode: --dangerously-skip-permissions, --sandbox danger-full-access, --yolo.

Version 1.8.12 used a 2,000-word prompt framing the AI as a "forensic agent" tasked with scanning for credentials, financial data, and trade secrets, then scattering findings across email, Slack, and ticketing systems. It closed with compliance language — SOX, Dodd-Frank, GDPR — to frame exfiltration as a legal obligation. Version 1.8.13 refined the approach: generate a report, create a GitHub repository called posture-report-trivy, push the stolen data using the developer's authenticated GitHub CLI.

Socket discovered it within 24 hours. No confirmed exfiltrations.

Chain 3 — Simultaneous Multi-Agent Compromise

Comment and Control, April 2026. Aonan Guan, Zhengyu Liu, and Gavin Zhong at Johns Hopkins. One PR title compromised Claude Code, Gemini CLI, and Copilot simultaneously. Claude Code leaked its API key. Anthropic's fix: disallowed the ps tool. Not least-privilege. Not architectural change. One tool removed from the allowlist.

Anthropic's Opus 4.7 system card explicitly states that Claude Code Security Review is "not hardened against prompt injection." They shipped it knowing.

Chain 4 — The Self-Propagating Virus

AgentHopper, 39C3 (December 2025). Johann Rehberger spent August 2025 systematically hacking every major coding agent — one vulnerability per day. Simon Willison called it "The Summer of Johann." The culmination: AgentHopper, a proof-of-concept self-propagating AI virus. Prompt injection in a repository infects a developer's coding agent. The infected agent carries the payload to other repositories. Those repositories infect other developers' agents. It uses conditional prompt injections — if-then logic — to target different agent types simultaneously.

He also demonstrated: Devin could be turned into "ZombAI" — expose ports, leak tokens, install C2 malware. Reported to Cognition April 2025. Acknowledged. 120+ days, no fix. Cognition is now seeking a $25 billion valuation.

Chain 5 — Systematic Benchmarking

"Your AI, My Shell" (arxiv 2509.22040). Yue Liu et al. built AIShellJack — an automated framework with 314 attack payloads covering 70 MITRE ATT&CK techniques. Tested against Copilot and Cursor. Attack success rate: 84% for executing malicious commands. Effective across initial access, system discovery, credential theft, and data exfiltration.

The Numbers

Anthropic is the only vendor to publish its own injection success rates. Their Opus 4.6 system card reports: a single prompt injection attempt against a GUI-based agent succeeds 17.8% of the time without safeguards. By the 200th attempt: 78.6%.

A meta-analysis of 78 studies (Maloyan & Namiot, January 2026) found attack success rates exceeding 85% against state-of-the-art defenses when adaptive strategies are employed. Their conclusion: "LLMs cannot reliably distinguish between instructions and data."

That sentence is the whole problem. There are no prepared statements for natural language. SQL injection was solved architecturally — separate the code channel from the data channel. No equivalent exists here. The instruction surface and the data surface are the same surface.

The Pattern

Vendor	Vulnerability	Response
OpenAI	AGENTS.md hijacks Codex (NVIDIA PoC)	Closed: "does not significantly elevate risk"
Cognition	Devin: full ZombAI capability	Acknowledged. 120+ days, no fix.
Anthropic	MCP STDIO design flaw (200K servers)	"Expected behavior"
Anthropic	Comment and Control (CVSS 9.4)	Removed one tool from allowlist. $100 bounty.
All three	Comment and Control patches	Patched quietly. No CVEs. No advisories.

Acknowledge, minimize, ship. The pattern holds across every vendor, every disclosure, every severity level. OpenAI's response to the NVIDIA proof-of-concept — "does not significantly elevate risk beyond compromised dependency scenarios" — reveals a specific blind spot. A compromised dependency can inject code. A hijacked agent can inject code, hide it from PR reviews, and actively mislead the human reviewer. The agent doesn't just execute the attack — it conceals it using the same capabilities it uses to summarize its own work.

Both Pillars

The Agentic AI Foundation launched two founding technical projects: AGENTS.md for agent instructions, and MCP for agent-tool communication. Both have critical vulnerabilities.

Ox Security found an architectural design flaw in MCP SDKs across Python, TypeScript, Java, and Rust. The STDIO interface executes commands regardless of whether the process starts successfully. 30+ remote code execution issues across LiteLLM, LangFlow, Windsurf, Cursor, Flowise, and DocsGPT. 200,000 servers. 150 million monthly SDK downloads exposed. Anthropic's response: "expected behavior."

The standard-setting body's own protocols are the attack surface. Both of them.

The Architecture

At the 39th Chaos Communication Congress, Rehberger put it plainly: "The model is not a trustworthy actor in your threat model." He demonstrated that agents could modify their own security configuration files — enabling auto-approval settings that bypass human confirmation. They could use allowed commands like ping and nslookup to exfiltrate data encoded as DNS subdomains to attacker-controlled servers. The attack surface isn't a bug. It's the feature set.

GitHub reported 39 million leaked secrets on the platform in 2024. Projects using AI assistants show a 40% increase in secrets exposure. ProjectDiscovery's 2026 survey of 200 security practitioners: secrets exposure is the #1 concern (78%), supply-chain risks #2 (73%), and 66% of security time goes to manual validation rather than remediation.

Meanwhile, 48% of cybersecurity professionals now identify agentic AI as the single most dangerous attack vector. Not one of the most dangerous. The most dangerous.

The industry shipped 60,000 repositories with instruction files that any dependency can rewrite, connected them to 200,000 tool servers with architectural flaws, gave them terminal access and network connectivity, and called it a standard. The instruction surface isn't a vulnerability in the system. It is the system.