7 min read

The Stack That Emerged

The Stack That Emerged

On March 15, Claude Code authored 326,731 public GitHub commits in a single day. SemiAnalysis projects that number will hit 20% of all daily commits by year-end.

Nobody has a plan for who reviews them.

This is the fact that explains everything else happening in AI coding right now. Three products — Claude Code, Cursor, Codex — converged on three distinct layers of the same stack without a single planning meeting, partnership agreement, or shared specification. They arrived at the same architecture because they were all responding to the same structural pressure: code generation became trivially easy, and everything downstream of generation did not.

The Convergence Nobody Coordinated

Watch what happened in April alone:

On April 2, Cursor shipped version 3 — rebuilt from scratch, not around editing code but around managing agents that edit code. The new Agents Window runs multiple AI agents in parallel across repos, hands off between local and cloud execution, and lets developers orchestrate rather than type. Anysphere’s team described it as the beginning of a “third era of software development, where fleets of agents work autonomously.”

On April 16, Codex shipped desktop background agents — multiple AI agents working in parallel on your Mac, autonomously controlling the browser, testing frontend changes, with memory across sessions. A direct move from cloud-async into Claude Code’s territory.

The same day, Anthropic released Opus 4.7: 87.6% SWE-bench Verified, 64.3% Pro, at the same price as its predecessor. The execution layer got materially better while costing exactly the same.

Three companies, three product decisions, the same two-week window. None of them planned this. All of them ended up building different parts of the same stack.

Layer 1 — Execution COMMODITY
Solved

Claude Code · Codex · GLM-5.1 · Qwen Code · OpenCode

Models converging, prices falling, open-source competitive. The model is the commodity. The agent is interchangeable.

Layer 2 — Orchestration EMERGING
Competitive

Cursor 3 · VS Code multi-provider · Claude Code Routines · Codex background agents

Managing parallel agents, routing between models, handing off cloud↔local. Where the product competition moved.

Layer 3 — Verification UNSOLVED
Open frontier

CodeRabbit · Qodo · Codex Security · ???

43% of AI code fails in production. 0% of engineering leaders are “very confident” it works. $256M raised, problem barely dented.

The stack nobody designed. Each layer emerged from a different failure mode.

Layer 1: The Part That Got Easy

Execution is the layer that settled first. In the commodity thesis I’ve tracked across six articles, the story has been the same: models converge, scaffolding creates the gap, and the gap itself gets smaller as scaffolding commoditizes.

The April 16 Opus 4.7 release proved the pattern is accelerating. A 6.8-point jump on SWE-bench Verified. A 10.9-point jump on Pro. Same price. If you’re an engineering team evaluating AI coding in April 2026, the execution layer just got materially better at zero additional cost.

Meanwhile, in the accessible tier, six models sit within a 10-point band on SWE-bench Pro. Three are proprietary, three are open-weight. One costs $0.33 per million tokens. The execution layer didn’t just commoditize — it commoditized in a way that makes the layer beneath it legible. When every model can generate code that passes benchmarks, the question shifts: what happens to that code after it’s generated?

Layer 2: Managing the Flood

Cursor’s pivot tells the story of Layer 2 better than any analysis could.

Cursor was an IDE. A VS Code fork with AI completion built in. For most of 2025, the product was about helping you write code faster inside a familiar editor. Then in April 2026, Anysphere rebuilt the entire interface from scratch — not to help you write code, but to help you manage agents that write code for you. The Agents Window is a standalone workspace for running parallel coding agents across repos. Design Mode lets you point at UI elements and direct agents visually. Cloud-to-local handoff lets you start a task remotely and finish it on your machine.

This isn’t a feature update. It’s an admission about where value lives. If code generation is cheap, the product isn’t the code — it’s the control plane that decides what gets generated, in what order, and where the results go.

Codex arrived at the same conclusion from the opposite direction. Starting as cloud-async fire-and-forget (upload your repo, describe the task, get a PR back), Codex’s April 16 desktop update added background agents on Mac — multiple agents running simultaneously without interfering with each other. Where Cursor went from local-first to cloud-capable, Codex went from cloud-first to local-capable. They met in the middle: orchestration.

Claude Code Routines, shipped April 14, completed the pattern. Scheduled cloud-executed agents that run recurring tasks — up to 25 per day on Enterprise. The terminal tool added a management layer.

Three products. Three architectures. All building orchestration in the same two-week window. Nobody coordinated.

Layer 3: The Part Nobody Solved

Here’s where the stack breaks.

On April 14, Lightrun published their 2026 State of AI-Powered Engineering Report. The headline: 43% of AI-generated code requires manual debugging in production, even after passing QA and staging. Not a single respondent — zero percent — of 200 SRE and DevOps leaders described themselves as “very confident” that AI-generated code will behave correctly once deployed.

The same week, I found a JetBrains/UC Irvine study presented at ICSE 2026 in Rio de Janeiro that quantified something I’d been circling since The Output Trap. Researchers tracked 800 developers over two years through 151.9 million IDE interaction events. AI users typed 587 more characters per month than non-users. They also deleted 102 more lines per month. They context-switched 6.4 more times per month. Non-users were doing the opposite — becoming more focused while AI users were becoming more scattered.

“AI redistributes and reshapes developers’ workflows in ways that often elude their own perceptions.”

— Sergeyuk et al., ICSE 2026. 82.3% of AI users reported perceived productivity gains. Telemetry showed volume increase, not efficiency.

Prior work the paper cited: 18.16% of accepted AI suggestions were later deleted entirely. Another 6.62% were heavily rewritten. One in four AI-generated code blocks doesn’t survive first contact with the developer who accepted it.

The Stack Overflow 2025 Developer Survey measured the same thing from the demand side. 84% of developers use AI tools. Only 29% trust their output — down 11 points from the year before. 46% actively distrust them. Experienced developers were the most skeptical, with a 20% “highly distrust” rate.

This is the gap the $256 million I tracked in #38 was supposed to close. CodeRabbit, Qodo, Codex Security, and others all raised money to build verification infrastructure. But the gap widened. In March 2026, Amazon suffered outages traced to AI-assisted code deployed without proper approval. Satya Nadella and Sundar Pichai both claim a quarter of their companies’ code is now AI-generated. The verification layer for that code is — optimistically — half-built.

Why This Stack and Not Another

The three layers didn’t emerge randomly. Each one formed in response to a specific bottleneck that the previous layer created.

Execution came first because models got good at generating code before anyone figured out what to do with it. The arms race was on the input side: better models, longer context windows, more agentic loops, deeper codebase understanding. This competition drove SWE-bench scores from 33% to 87% in eighteen months and pushed prices from $60 per million output tokens to $1.95. The layer that generates code is, by any measure, oversupplied.

Orchestration emerged because execution produced a management problem. When a single developer can run eight parallel coding agents across multiple repos, someone has to route the tasks, resolve conflicts, merge the outputs, and handle the failures. Cursor 3 exists because Cursor 2’s AI coding was too good to keep managing through a single chat panel. The solution to “too much output” was a control plane.

Verification didn’t emerge as a product layer — it emerged as a deficit. The verification gap I wrote about in March hasn’t closed. It’s grown. 4% of GitHub commits in February, 326K commits per day by March, projected to 20% by December. Each percentage point is code that needs review by people who are already overloaded.

The JetBrains AI Pulse survey adds the demand-side data: 22% of developers already use AI coding agents, not just completion tools. 66% of companies plan agent adoption within 12 months. Only 13% use AI across their full software development lifecycle. The flood has barely started.

What the Stack Can’t Do

The stack has a clean answer for Layer 1 (use whatever model is cheapest for your task) and a competitive answer for Layer 2 (Cursor, Codex, or Claude Code, depending on your workflow). Layer 3 has no answer.

Not “no good answer.” No answer. The tools that exist — CodeRabbit, Qodo, Codex Security — are real products with real traction. But they’re patching specific failure modes, not solving the structural problem. The structural problem is that generation scales with compute and verification scales with human attention. I named this in The Output Trap. The ICSE data now quantifies it: developers are producing more but reviewing more poorly, while perceiving themselves as more productive.

The stack emerged because the industry needed it. But the stack can’t complete itself. The bottom two layers are engineering problems. The top layer is a human-system problem that no amount of additional AI makes easier — because adding AI to review AI-generated code just pushes the trust question up one level.

326,731 commits per day. Projected to be 20% of all public GitHub by December. The stack that emerged to handle this flood has two working layers and a hole at the top where the review infrastructure should be.

The tools are real. The convergence is real. The architecture is elegant. And the question it can’t answer — who verifies the output? — is the same question it was built to solve.

Sources: SemiAnalysis: Claude Code is the Inflection Point · The New Stack: AI Coding Stack · Cursor 3 announcement · VentureBeat: Lightrun 2026 Report · ICSE 2026: Evolving with AI (arXiv 2601.10258) · Stack Overflow 2025 Developer Survey · Anthropic Opus 4.7 · JetBrains Research: AI impact on developer workflows · Lightrun 2026 State of AI-Powered Engineering · GIGAZINE: Claude Code GitHub commits