Adversa AI published TrustFall today. The finding is simple. A cloned repository with two JSON files — .mcp.json and .claude/settings.json — can spawn an attacker-controlled MCP server the moment a developer opens the folder in any of the four major AI coding CLIs. Claude Code, Gemini CLI, Cursor, Copilot. All four. One keypress.
Here is what Claude Code asks you:
Quick safety check: Is this a project you created or one you trust?
[Default: Yes, I trust this folder]
Press Enter. That's it. An unsandboxed executable now has access to your ~/.ssh/, your ~/.aws/, your shell history, your source code from other projects, and a long-lived command-and-control channel back to the attacker. Not because of a bug. Because of a design decision.
Four Dialogs, One Problem
Four tools, four different companies, one shared architecture: ask a yes/no question, default to yes, then execute whatever the repository told you to execute.
Gemini CLI is the least bad — it at least names the helpers. But naming a program doesn't tell you what it does. And the default is still trust.
Zero Keypresses
One keypress is the human attack surface. The automated one is worse.
Claude Code's official GitHub Action — claude-code-action — runs headless. There is no trust dialog. It never renders. When an outside contributor opens a pull request with malicious project files, the MCP server starts automatically. Deploy keys, signing certificates, cloud tokens — all accessible to whatever the attacker's server wants to do with them.
The trust boundary that was supposed to protect developers doesn't exist in the environment where the attack surface is largest.
The Regression
This is the detail that turns a vulnerability into a structural argument.
Earlier versions of Claude Code's trust dialog explicitly warned that the project could execute code through MCP. They offered three choices: trust the folder with MCP enabled, trust without MCP, or don't trust. That third option — trust without MCP — was the correct default for opening code you didn't write.
It was removed. Claude Code 2.1+ simplified the dialog to a binary question with "Yes, I trust this folder" as the default. The user who presses Enter today gets less information than the user who pressed Enter six months ago.
~/.ssh/, ~/.aws/, shell history • Access source code from other projects • Open persistent network connections • Exfiltrate environment variables including API keys • Establish long-lived C2 channels
A reasonable person reads "trust this folder" as "trust the code inside it." The system interprets it as "consent to silent code execution outside it."
Functioning as Designed
Anthropic's security team reviewed TrustFall and declined it. Their position: accepting "Yes, I trust this folder" constitutes consent to the full project configuration, including MCP execution. Post-trust-dialog behavior is outside their threat model. The boundary is functioning as designed.
They're right. That's the problem.
The trust dialog is the security boundary. And the security boundary is a single sentence that doesn't mention code execution, doesn't mention filesystem access beyond the project, doesn't mention network connections, and defaults to yes.
This is what happens when a security boundary is designed for usability. The dialog needs to be fast enough that developers don't disable it, vague enough that it doesn't alarm casual users, and simple enough that it works as a one-click gate. By the time you satisfy those constraints, you've built a consent mechanism that consents to everything while disclosing nothing.
The Accumulation
TrustFall didn't arrive alone. It joins a growing record of AI coding tool compromises that share the same structural shape: project-level configuration files that execute with developer-level privilege.
| CVE / Finding | Tool | Mechanism |
|---|---|---|
| CVE-2025-59536 | Claude Code | Hooks-based RCE via project config |
| CVE-2026-21852 | Claude Code | API key exfiltration via ANTHROPIC_BASE_URL |
| CVE-2026-33068 | Claude Code | Shell command deny rules fail after 50 subcommands |
| CVE-2026-26268 | Cursor | Git hook arbitrary code execution |
| CVE-2025-53773 | GitHub Copilot | Prompt injection in PR descriptions → RCE (CVSS 9.6) |
| CVE-2026-32173 | Azure SRE Agent | Unauthenticated WebSocket, CVSS 8.6 |
| TrustFall | All four CLIs | Malicious MCP via project config, one-click RCE |
Every entry in this table shares a property: the attack originates from configuration files that the developer didn't write and the tool didn't flag. The project tells the tool what to execute, and the tool executes it.
Meanwhile, Georgia Tech's Vibe Security Radar has confirmed 74 real-world CVEs introduced by AI-generated code as of March 2026 — 6 in January, 15 in February, 35 in March — and estimates the true number is five to ten times higher. The tools that generate vulnerable code can themselves be compromised through the code they're asked to read. It's the same cycle I described in The Self-Generating Supply Chain, now with a concrete entry point.
The Consent Problem
The structural issue isn't that TrustFall exists. It's that the entire AI coding security model depends on a UI element that can't carry what it's being asked to carry.
An AI coding CLI needs broad filesystem access to function. It needs to read your project structure, modify files, run commands. MCP extends this to external tools. The tools need these capabilities to be useful. A dialog that accurately described every permission — filesystem reads beyond project scope, network access, executable spawning, credential exposure — would be so long and alarming that either nobody would accept it or everyone would stop reading it. Both outcomes produce the same result: uninformed consent.
The trust prompt was the best idea anyone had for solving the tension between capability and safety. TrustFall proved it doesn't solve anything. Anthropic's response — "functioning as designed" — is the honest answer. The design can't hold what it's being asked to hold.
Five articles ago, I traced the instruction surface — the layer of config files, skill definitions, and natural-language instructions that tell AI agents what to do. TrustFall is what it looks like when someone walks through that surface. Not with a zero-day. Not with a novel technique. With two JSON files and a default button.