On March 5, 2026, Amazon.com went dark across North America for six hours. Order volume dropped 99%. An estimated 6.3 million orders were lost. Three days earlier, another outage had already cost 120,000 orders and produced 1.6 million errors. Internal documents obtained by Business Insider described "a trend of incidents with high blast radius caused by Gen-AI-assisted changes for which best practices and safeguards are not yet fully established."
Amazon's response was a 90-day code safety reset — mandatory dual-reviewer sign-off, stricter automated checks, tighter documentation requirements. It applied to 335 Tier-1 systems. The company publicly disputed that AI was the root cause. The internal memo told a different story.
This isn't an Amazon story. It's an arithmetic story. Three trends are converging simultaneously across the industry, and the math is simple enough that you can do it on a napkin.
The Three Trends
Each trend individually is manageable. Together they form a mechanism.
More Code, Fewer Eyes
The volume expansion is real and documented. GitHub's Octoverse puts 41% of new code as AI-assisted. The output jump is roughly 60% year-over-year. Engineers who once produced two or three pull requests per day now produce five or six. This is the number organizations celebrate.
What they don't celebrate — and mostly don't measure — is what happens downstream. Reviewer capacity hasn't changed. The same senior engineers who reviewed three PRs a day are now looking at six. They don't review twice as fast. They review half as carefully, or they let things through, or they procrastinate. Engineers "dread" reviewing AI-generated PRs. The code is correct-looking, consistently structured, and subtly wrong in ways that take longer to catch than sloppy human code.
Meanwhile, the checker layer is being actively dismantled. 113,000 tech jobs cut in 2026, with 47.9% attributed to AI automation. The roles disappearing are not random — they are disproportionately the roles that catch mistakes. Atlassian's layoffs specifically targeted QA, stating that AI tools had "reduced the need for manual testing by approximately 60%." Entry-level hiring at the top fifteen tech firms fell 25% from 2023 to 2024. The junior developers who review senior work, ask naive-but-essential questions, and learn the system by reading every PR — they are being eliminated as a cost center.
The logic is seductive: if AI writes the code, you need fewer people to write it. But the logic quietly assumes that AI-written code doesn't need checking. Every piece of evidence says it does.
The Debt That Stays
The largest empirical study of AI code in the wild — "Debt Behind the AI Boom" by Liu et al., covering 302,600 commits across 6,299 repositories — found that 15% or more of commits from every AI assistant introduce at least one issue. That's expected; human commits introduce issues too. The devastating number is what comes next: 22.7% of AI-introduced issues are still present at the latest version. They never get fixed. Not eventually. Not slowly. Never.
This is not a cleanup problem. This is compound interest.
The Lightrun survey of 200 senior SRE and DevOps leaders makes the same point from the operations side. 43% of AI-generated code changes need debugging in production — after passing QA and staging. Zero percent of engineering leaders reported being "very confident" that AI code would behave correctly in production. Zero percent could verify an AI-suggested fix in a single redeploy cycle; 88% need two or three cycles, and 11% need four to six.
"0% of engineering leaders very confident AI code will behave correctly in production. 0% can verify an AI-suggested fix in one redeploy cycle."
— Lightrun State of AI-Powered Engineering, 2026
Not "low confidence" — zero confidence. The people responsible for keeping systems running do not trust what's being deployed, and they cannot verify it efficiently.
The Arithmetic
Put the three trends together and the mechanism becomes transparent. Organizations are increasing verification demand (more code, more PRs, more AI-generated changes that look right but might not be) while simultaneously cutting verification capacity (fewer QA engineers, fewer juniors, fewer reviewers per PR). The gap between what needs checking and what gets checked widens every quarter.
The debt accumulates because it's invisible until it isn't. A code smell that stays in the codebase for six months costs nothing — until it interacts with another code smell that also stayed for six months, and together they produce a failure mode that nobody anticipated because nobody read either change carefully enough when it landed.
That's what happened at Amazon. Not a single catastrophic mistake. A trend — "incidents with high blast radius caused by Gen-AI-assisted changes" — that built up until two outages hit within three days.
The Counter-Bet
One major technology company looked at these three trends and drew the opposite conclusion from the rest of the industry.
In February 2026, IBM announced it was tripling its entry-level hiring — including for software developers and other roles that the industry consensus says AI can replace. CHRO Nickle LaMoreaux's reasoning was structural, not sentimental: "The companies three to five years from now that are going to be the most successful are those companies that doubled down on entry-level hiring in this environment."
IBM isn't arguing that AI tools don't work. They're arguing that AI tools require human oversight, and oversight requires people, and people require a pipeline, and the pipeline takes years to build and months to destroy. They're redesigning junior roles — less routine coding, more customer interaction, more AI oversight — rather than eliminating them.
Whether IBM is right depends on which timescale you optimize for. On a quarterly basis, cutting QA and junior headcount while boosting output with AI is the rational move. The cost savings are immediate. The debt is invisible. On a three-to-five-year basis, you've dismantled the verification layer that prevents your systems from failing in ways that cost you 6.3 million orders in a day.
The industry chose the quarter. Amazon found out what that costs. The rest are still running the experiment.