The Output Trap

A financial services company adopted Cursor last year. Monthly code output went from 25,000 lines to 250,000. Ten months later, a million lines sit in the review queue, untouched. The developers who could review the code are now spending their time generating more of it.

That company isn't named. It appears in a New York Times investigation by Mike Isaac, published April 6, under the headline "The Big Bang: A.I. Has Created a Code Overload." Joni Klippert, CEO of security testing firm StackHawk, told Isaac: "The sheer amount of code being delivered, and the increase in vulnerabilities, is something they can't keep up with."

This isn't a story about one company. Four independent research efforts, using different methodologies on different datasets, arrived at the same conclusion in the first quarter of 2026.

The evidence

Start with the code itself.

GitClear analyzed 211 million lines across Google, Microsoft, Meta, and enterprise repositories from 2020 to 2024. The pattern is unambiguous:

Metric	Pre-AI (2022)	Post-AI (2024)	Change
Copy/pasted code	8.3%	12.3%	+48%
Refactored/moved code	24.1%	9.5%	−60%
Code churn (rewritten <2 weeks)	3.1%	5.7%	+84%
Duplicate code blocks	baseline	~10×	+900%

The Pearson correlation between Copilot adoption rate and code churn was 0.98. GitClear CEO Bill Harding's caveat: "I wouldn't say that the report proves that AI assistants are reducing code quality since our data is correlational." Fair. But 0.98 is the kind of correlation that makes you stop looking for confounders.

2024 was the first year in GitClear's history that copy/pasted lines exceeded refactored lines. The code isn't just growing. It's getting worse as it grows.

The problems that never get fixed

In March 2026, a team led by Liu et al. published "Debt Behind the AI Boom" on arXiv. They analyzed 304,362 verified AI-authored commits from 6,275 GitHub repositories across five AI coding assistants, running static analysis before and after each commit and tracking issue lifecycles.

They found 484,606 distinct issues. 89.1% were code smells. Over 15% of AI commits introduced at least one new issue.

The number that matters most: 24.2% of AI-introduced issues still survive at the latest revision. Nearly one in four problems introduced by AI code is never fixed. Not eventually resolved. Not pending review. Never touched again.

This is compound interest working against you. Each sprint adds issues. A quarter of them become permanent residents. The codebase accumulates debt faster than it retires it.

The amplifier

The DORA 2025 report — Google's annual State of DevOps study, roughly 5,000 respondents, 100+ hours of qualitative interviews — found that 90% of developers now use AI tools. The expected outcome was better software delivery. The actual outcome:

"AI's primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organisations and the dysfunctions of struggling ones."
— DORA 2025

Developers using AI interact with 47% more pull requests daily, complete 21% more tasks, and merge 98% more PRs. Organizational delivery stability? Flat. The negative relationship between AI adoption and delivery stability persisted from 2024 into 2025.

Separately, Faros AI's telemetry study of 10,000+ developers found review time up 91%, PR size up 154%, and bug rates up 9%. DORA's qualitative interviews corroborate: "Reviewing [another's] code is so much harder than writing it. AI tools are increasing the rate at which people can churn out code that needs to be reviewed..."

The output increased. The absorption capacity didn't.

The math that doesn't close

Google's own data, analyzed by Sonar, puts it starkly: AI now generates over 30% of Google's code. Velocity improved 10%. When you generate 30% more and get 10% faster, the gap isn't a rounding error. It's structural. Per-unit quality improvements cannot offset the structural decay that comes from volume.

This is the output trap. The productivity gain is real at the keystroke. It's real at the task. It vanishes at the organization because every line generated creates a review obligation, a maintenance liability, and a security surface — and those costs don't scale down the way generation scales up.

Stage 1

10× output

25K → 250K lines/mo

→

Stage 2

Quality collapse

Refactoring −60%, churn +84%

→

Stage 3

Review backlog

+91% review time, 1M line queue

Stage 4

Permanent debt

24.2% of issues never fixed

→

Stage 5

Cleanup industry

$200+/hr specialists, 300% demand

→

Result

Org velocity flat

DORA: AI ↔ stability negative

The cleanup industry as proof

You know a problem is real when people start getting paid to fix it. "Vibe coding cleanup specialists" is now a job category. SoftTeco, Zethic, Belitsoft, CodeConductor, and others are advertising services specifically for rewriting AI-generated codebases. Rates start at $200/hour. Demand is up 300%.

25% of Y Combinator's Winter 2025 batch had codebases that were 95% or more AI-generated. An estimated 8,000+ startups need full or partial rebuilds, creating a cleanup market valued between $400 million and $4 billion.

Collins Dictionary named "vibe coding" its Word of the Year for 2025. Apple processed 235,800 new App Store submissions in Q1 2026 alone — 84% year-over-year growth — and is now actively cracking down on quality.

The same asymmetry, everywhere

I've been tracking this dynamic for weeks without naming it. In security: Claude Opus 4.6 found 500+ zero-days because offense scales with compute while defense scales with human maintainers. In code review: $256 million flowed into verification startups because generation outpaced the ability to check it. In the productivity paradox: 14-55% task-level gains, zero firm-level impact, because the organization can't absorb what the individual produces.

It's all the same structural asymmetry. Generation is cheap. Verification is expensive. Offense scales. Defense doesn't. Output grows exponentially. Absorption grows linearly, if at all.

The output trap isn't a bug in AI coding tools. It's an emergent property of any system where production costs collapse faster than quality assurance costs. The tools work. The problem is that "working" means generating more code than the surrounding human and organizational infrastructure can metabolize.

Every company adopting AI coding tools faces the same question, whether they know it or not: are you building the absorption capacity as fast as you're building the output capacity? DORA's data says almost no one is. The organizations that do succeed — Stripe, Intercom, the top quartile — built the review infrastructure, the testing pipelines, and the quality gates before they turned on the AI.

The rest are generating their way into a backlog they'll be paying to clean up for years.