On March 31, researchers from the Calif security team pointed Claude at GNU Emacs with a simple prompt: "I've heard a rumor that there are RCE 0-days when you open a txt file without any confirmation prompts."
Claude found the bug. Built a working proof-of-concept. The exploit executes arbitrary code when you open a text file in Emacs. No confirmation dialog. No warning.
The Emacs maintainers declined to fix it. They said the root cause was Git, not Emacs.
This is the security story of 2026 in miniature: an AI finds a critical vulnerability in minutes, and the human side of the pipeline can't — or won't — keep up.
The Month Everything Accelerated
The Emacs disclosure was part of MAD Bugs — Month of AI-Discovered Bugs — a research initiative running through April 2026 that's producing a new zero-day disclosure every few days. The initiative uses Claude Opus 4.6, the current production model, with no specialized tooling or custom frameworks. Just conversational prompts.
The results have been extraordinary.
Anthropic has identified and validated over 500 high-severity vulnerabilities in open-source software using Claude Opus 4.6.
Five hundred. In well-tested, heavily-fuzzed codebases. Projects that have accumulated millions of CPU hours of fuzzing found bugs that automated tools missed — because Claude reasons about code patterns the way a human researcher would, finding semantic vulnerabilities that statistical approaches can't reach.
The specific disclosures tell the story:
| Target | CVE | Severity | Status |
|---|---|---|---|
| Vim 9.2 | CVE-2026-34714 | Critical (9.2) | Patched (v9.2.0272) |
| GNU Emacs | — | Critical (RCE) | Declined to fix |
| FreeBSD kernel | CVE-2026-4747 | Critical (kernel RCE) | Patched |
| Firefox | CVE-2026-2796 | Critical | Patched (Firefox 148) |
| GhostScript | — | High | Patched |
| OpenSC, CGIF, + 490 more | — | High | Various |
That Firefox line has a backstory. In early March, Anthropic partnered with Mozilla and Claude Opus 4.6 found 22 CVEs in Firefox's C++ codebase in two weeks — 14 high-severity, one critical at 9.8 CVSS. The cost: $4,000 in API credits.
Four Hours to Root Shell
The most technically impressive result was the FreeBSD kernel exploit.
On March 29, Nicholas Carlini — a security researcher at Anthropic — pointed Claude at CVE-2026-4747, a stack buffer overflow in FreeBSD's RPCSEC_GSS NFS authentication module. He asked Claude to write a working exploit.
Claude solved six distinct problems autonomously: setting up the lab environment (recognizing FreeBSD needed 2+ CPUs because it spawns 8 NFS threads per CPU), devising a 15-round multi-packet shellcode delivery mechanism, cleanly terminating hijacked kernel threads via kthread_exit(), calibrating stack offsets with De Bruijn patterns when initial disassembly proved inaccurate, transitioning from kernel to userland execution, and clearing stale debug registers that crashed child processes.
It wrote two exploits using two different strategies. Both worked on the first try.
Four hours of compute time. A working remote root shell.
The caveat matters
FreeBSD 14.x has no KASLR — kernel addresses are fixed and predictable. No stack canaries for integer arrays. As Hacker News commenters correctly noted, a modern hardened Linux kernel would be significantly harder. The target had 1990s-era protections. This is real, but it's not "AI can exploit anything" — yet.
What it is: proof that the full exploit development pipeline — from vulnerability analysis through ROP gadget construction, shellcode generation, and privilege escalation — is now within AI competency for targets with standard protections. The floor just rose.
The Broken Pipeline
Here's what keeps me up at night. Not the offensive capabilities — the asymmetry.
Finding Bugs (Offense)
500+ zero-days in one initiative
22 Firefox CVEs in 2 weeks for $4K
Working kernel exploit in 4 hours
No specialized tooling required
Scales with compute
Fixing Bugs (Defense)
60+ day average remediation for critical CVEs
32% still unpatched after 180 days
Emacs maintainers declined to fix
131 CVEs disclosed per day in 2025
Scales with human maintainers
The responsible disclosure framework — the 90-day window that's been the industry standard since Google's Project Zero formalized it — was designed for a world where finding bugs was hard, slow, and expensive. Human researchers finding a handful of vulnerabilities per year per target.
MAD Bugs is producing a new disclosure every few days. Anthropic's own red team page acknowledges that traditional 90-day disclosure windows may not hold up against the speed and volume of AI-discovered bugs.
When one model can generate 500+ reports that each require human triage, human verification, and human patching — the pipeline breaks. Not because the bugs aren't real. Because the people who fix them can't move at the speed of the system that finds them.
The Shadow of What Comes Next
Everything above was done by Claude Opus 4.6 — the current production model available to anyone with an API key.
In late March, a CMS misconfiguration at Anthropic leaked draft documentation for their next model tier, codenamed Mythos (Capybara). The leaked blog post described it as "currently far ahead of any other AI model in cyber capabilities" and warned it "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."
Cybersecurity stocks dropped 3-7% on the news alone. No benchmarks. No demo. Just the claim that something better was coming.
If Opus 4.6 can find 500+ zero-days and write a working kernel exploit in four hours, what does "far ahead" look like?
The Verification Gap, Again
I've spent weeks writing about the verification gap in AI-generated code — the chasm between how fast AI can produce code and how fast humans can verify it. The security version of this story is the same architecture, with higher stakes.
Generation is cheap. Verification is expensive. The gap is widening.
In coding, the cost is bugs and technical debt. In security, the cost is exploitable systems. The Emacs maintainers didn't decline to fix because they're negligent — they made a reasonable scoping decision about what their project is responsible for. But the vulnerability is real, and it's in every Emacs installation on every server running Git.
The 90-day disclosure window assumed one side of the pipeline — the discovery side — would stay slow. It didn't. Now the entire framework needs to be rethought, not just for AI-discovered bugs, but for a world where the offense-defense asymmetry is amplified by orders of magnitude.
Five hundred zero-days. A working kernel exploit in four hours. And a maintainer who said no.
The tools work. The pipeline doesn't.