5 min read

The Leaderboard Worked

The Leaderboard Worked

On April 14, Praveen Neppalli Naga, Uber's chief technology officer, told Laura Bratton at The Information: "I'm back to the drawing board, because the budget I thought I would need is blown away already."

It is mid-April. Uber's annual AI budget is gone. The company spent $3.4 billion on R&D in 2025 and expects that number to keep climbing. Most of the overrun went to one tool: Anthropic's Claude Code.

The reason it happened isn't that the tool failed. The reason it happened is that two systems — one inside Uber, one inside Anthropic — were each tuned to maximize adoption. They both worked.

Inside Uber: usage on the leaderboard

Naga has been talking about Uber's AI push publicly since March. The numbers he has reported, in his own words: 95% of Uber engineers use AI tools monthly. About 70% of committed code is touched by them. An internal AI agent writes roughly 1,800 code changes per week with "zero human authoring" — engineers review and approve, but no human types the code. Approximately 11% of Uber's live backend powering ride-matching, pricing, and navigation is now AI-written.

The adoption curve in Naga's own telling:

Date Milestone Engineers using AI tools
Dec 2025 ~5,000 engineers given Claude Code access 32%
Feb 2026 Internal leaderboards rank usage 63%
Mar 2026 Naga: "a real reset moment for engineering" 95% monthly
Apr 2026 Annual AI budget exhausted ~70% of commits AI-touched

Naga himself described the mechanism: engineers were encouraged to use Claude Code and Cursor, with usage ranked on internal leaderboards. He noted that Claude Code had become dominant while Cursor plateaued. The leaderboard didn't measure whether the AI-written code shipped, whether it shipped without bugs, whether the reviewers spent more time on it than the engineer would have spent writing it themselves. It measured usage. So the org optimized for usage.

This is Goodhart's Law in its purest form: when a measure becomes a target, it ceases to be a good measure. Uber didn't choose poorly. Uber chose what every large engineering organization is choosing right now — make AI adoption legible, reward it, watch the curve go up. The curve went up. So did the bill.

Inside Anthropic: subsidy on the other side

Now flip the camera. The same months that Uber's adoption curve was bending up, Anthropic was running its own optimization on the supply side. The $200/month Max tier is priced at a small fraction of what its compute actually costs. Independent analyses converge on a similar number:

Subsidy math (third-party estimates)
Max subscription
$200/mo
Equivalent API spend
~$5,000/mo
One developer's 8-month tracking: 10B tokens, ~$15,000 at API rates, ~$800 on Max. Extreme power users have generated up to $300,000/year in compute against $2,400 in subscription revenue.

Anthropic isn't subsidizing because they miscalculated. They're subsidizing because they're playing the same playbook Uber played a decade ago: buy the market, lock in the workflow, monetize later. Claude Code has gone from $1B to $2.5B in annualized revenue in roughly a quarter. Most of that revenue comes from organizations like Uber. It is, structurally, a transfer of venture capital from Anthropic's Series G to Uber's R&D budget.

And this is the trick: both sides of the market were optimized for the same thing — adoption. Uber's leaderboard pulled it up from the consumer side. Anthropic's pricing pulled it up from the producer side. Two arrows pushing the same curve.

The collision

The arrows kept pushing until the economics of one side ran out. Anthropic moved first. On April 4, the company cut off third-party agentic tools from Pro and Max subscriptions, pivoting them to pay-as-you-go billing. Boris Cherny, head of Claude Code, said the subscription model "was not built for the usage patterns of these third-party tools." Heavy users reported cost increases of 10x to 50x. The 135,000+ OpenClaw instances that had been running on subsidized subscriptions were the most visible casualties, but the policy signal was clear: the price of a token is reverting to the cost of a token.

Ten days later, Naga went on record. The two events are not separately reported as connected, but they don't have to be. They're the same event seen from opposite sides. Anthropic's subsidy buys adoption from the consumer until either the subsidy stops or the adoption hits a real budget. In Uber's case, both happened the same week.

"I'm back to the drawing board, because the budget I thought I would need is blown away already."
— Praveen Neppalli Naga, Uber CTO, April 14, 2026

What Naga is doing next

Read the rest of Naga's interview and a pattern emerges. His longer-term vision is what he calls "agent engineers" — AI systems that don't just assist humans but fully handle coding, testing, and deployment, supervised by other AI tools. Uber's response to having spent its budget on assistive AI is to commit harder to autonomous AI.

From a benchmark perspective this makes sense. Sub-agents that spawn sub-agents and run overnight do produce more output per dollar of human time. From a budget perspective it is the same lever pulled harder. An autonomous agent that can write 1,800 changes per week with no human in the loop also burns tokens with no human in the loop. One engineering team running Claude Code in automated CI/CD loops, by The Information's accounting, can drain a monthly budget in days.

The leaderboard worked. It will continue to work. The question Naga's actual response raises is whether anyone is measuring the thing the leaderboard isn't measuring — whether the 1,800 changes a week are net useful, whether the 11% of backend code that's now AI-written is the 11% you wanted automated, whether $3.4 billion of R&D is buying more software or just more code.

The same asymmetry, with a price tag

I wrote about the output trap last week: generation scales cheaply, verification doesn't. Uber's situation is the cost-side companion. Generation scales cheaply for the user when the producer is subsidizing. The producer can subsidize until they need to monetize for an IPO. The user can keep adopting until they need to balance a budget. When those two timelines intersect, you get a blog post like this one and a CTO going back to the drawing board.

What is genuinely new in the Uber story isn't the cost overrun. Cost overruns happen. What is new is the precise mechanism: an internal metric (usage on a leaderboard) and an external pricing distortion (a 25x subsidy) compounding for a quarter, then arriving at the bill at the same time. It is, in microcosm, the equilibrium the entire AI coding industry is converging on.

The leaderboard worked. It always works. That's the problem.