Jevons' Paradox Has Entered the Chat: AI Tokens Are 99.7% Ch

The Oldest Paradox in Economics Just Explained Your AI Bill

In 1865, a 29-year-old English economist named William Stanley Jevons published The Coal Question. His observation was counterintuitive: James Watt's steam engine had made coal usage vastly more efficient — and coal consumption had increased, not decreased. Efficiency made coal cheaper per unit of work, which made it economically viable for more applications, which drove total consumption up.

One hundred and sixty-one years later, the same dynamic is playing out with AI inference tokens. And the numbers are not subtle.

1865 — Coal

Watt's engine cut coal per unit of work by two-thirds. UK coal consumption rose 10× over the next century.

2026 — AI Tokens

GPT-4 class inference fell 99.7% per token. Enterprise AI cloud spending rose 3× in one year.

Jevons was right about coal. He's right about tokens. The question is whether you've noticed.

The Price Collapse

The numbers bear repeating because they're almost absurd. In late 2022, GPT-4 class inference cost $37.50 per million tokens. By August 2025, equivalent capability — via DeepSeek V3, Qwen 3.5, and competitive pressure across the board — had fallen to $0.14 per million tokens. That's a 280-fold reduction in two years. According to Epoch AI, inference costs are falling 5-10× per year, driven by both hardware improvements and algorithmic efficiency gains (quantization, speculative decoding, distillation).

If you stopped reading here, you'd expect enterprise AI spending to crater. It didn't.

Enterprise AI cloud infrastructure spending: $11.5 billion (2024) → $18.3 billion (2025) → $37.5 billion (2026). Inference now accounts for 55% — the first time it has exceeded training.
— byteiota, Deloitte Tech Trends 2026

Inference — the cost of running models, not training them — was a third of the AI compute pie in 2023. It's now the majority. The projection is 70-80% by year-end 2026. This is Jevons' mechanism playing out in real time: cheaper inference made agentic workflows viable, agentic workflows multiplied inference calls, and total spending soared.

The Three Multipliers

Tokens got cheaper. So why did the bill go up? Because three architectural patterns are multiplying consumption faster than prices can fall.

1. Agentic Loops

A single prompt-and-response costs one LLM call. An agentic workflow — where the model plans, executes, observes, adjusts, and iterates — hits the model 10 to 20 times per task. OpenAI's o3 reasoning uses 83× more compute per task than GPT-4o on the same problem. The token cost per call dropped. The number of calls per task exploded.

2. Context Inflation

RAG (retrieval-augmented generation) is the industry standard for grounding model outputs. But every RAG call stuffs context windows with retrieved documents — a "context tax" that scales with the complexity of the question. Context windows grew from 8K to 1M tokens precisely because workloads demanded it. Bigger windows, more tokens per call, more money.

3. Always-On Agents

The shift from tools you use to agents that run continuously. Cursor's Automations trigger from codebase changes, Slack messages, and timers. CI/CD pipelines now include inference calls at every stage. Agents monitor, respond, and iterate 24/7. What was once a few dozen API calls per developer per day is becoming hundreds or thousands — without anyone actively typing.

The Jevons Mechanism

Tokens get cheaper

→

Agentic workflows become viable

→

Each task uses 10-83× more tokens

→

Total spending triples

The Revenue Scorecard

Somebody is getting paid. Here's who, as of March 2026:

Company	Run-Rate Revenue	Growth	Signal
OpenAI	$25B	3.4×/yr	910M weekly users. Losing API share to Anthropic.
Anthropic	$19B	10×/yr	$1B → $19B in 15 months. 80% enterprise. Not profitable.
↳ Claude Code	$2.5B	2× since Jan	Over half of Anthropic enterprise spend. 4× enterprise sub growth.
Cursor	$2B	2× in 3 mo	$50B valuation talks. 60% enterprise. NVIDIA + Alphabet backed.
GitHub Copilot	~$600M	—	4.7M paid subs. 42% market share. 90% of Fortune 100.

The detail that tells the whole story: Ramp's March 2026 data shows that in July 2025, OpenAI held roughly 95% of all API spending among their tracked companies. Eight months later, Anthropic holds approximately 80%. A near-complete market share reversal — driven almost entirely by coding agents and enterprise API consumption. Anthropic now captures 73% of spending among companies buying AI tools for the first time.

And 79% of Anthropic customers also pay for OpenAI. It's additive, not substitutive. Total spending grows.

What Developers Actually Pay

Enterprise revenue figures are abstract. Here's what the bill looks like at the individual level:

Tab completions (Copilot, etc.) $15–30/mo

Agentic subscriptions (Cursor Pro, Claude Code) $60–200/mo

API-direct power users (Opus 4.6, heavy agentic) $200–500/mo

Runaway agentic sessions $500–1,000+/mo

One team's $7,000 annual Cursor subscription depleted in a single day of heavy agentic usage.

The average developer now uses 2.3 AI coding tools simultaneously. At the team level (10 seats): Copilot Business runs $190/month, Cursor Business $400/month, Claude Code mid-tier around $1,000/month. None of these prices are stable — consumption-based billing means budgets are guesses.

But context matters: AI tooling is 1-3% of total developer cost. A fully loaded developer costs $12,500-$20,800/month. The tools cost $200-500. DX research shows an average 3.6 hours/week saved. At $75/hour, that's $1,080/month in recovered time. The ROI is 2-15× depending on usage intensity. The market is rational — the tools pay for themselves. The problem isn't ROI. It's predictability.

The FinOps Response

The FinOps Foundation's 2026 report documents the organizational panic. AI cost management is now active at 98% of surveyed enterprises, up from 63% the prior year. It's the #1 global FinOps priority. A new discipline — "FinOps for AI" — has emerged to bridge data science teams and CFOs.

The playbook that's emerging:

Smart Routing

Send 80% of requests to cheap open-weight models (Qwen 3.5, DeepSeek V4). Route only complex tasks to frontier models. Token optimization + auto-scaling yields 28% cost savings.

Kill Switches

Hard spending caps on agentic sessions. Runaway agents can burn thousands in hours. Automated circuit breakers trigger when token consumption exceeds task-level budgets.

Semantic Caching

Cache responses to semantically similar queries. Near-zero marginal cost for repeat patterns. Particularly effective for RAG-heavy workloads where similar questions recur.

Outcome Measurement

The new board-level metric: AI Efficiency Ratio — business value per dollar of inference spend. Organizations that can't articulate it face budget cuts regardless of technical sophistication.

And a dose of skepticism: Deloitte flags widespread "agent washing" — many claimed agentic initiatives are just automation rebranded. Of thousands of vendors claiming agentic capabilities, only about 130 are genuinely agentic. The hype tax is real.

What Jevons Would Say

Three transitions are happening simultaneously, and they all point in the same direction — more tokens, not fewer:

Subscription → Consumption. Flat-rate plans are losing to usage-based billing. GitHub Copilot's premium requests, Cursor's credit system, Claude Code's tiered pricing — all moving toward "pay for what you burn." Predictable budgets are becoming unpredictable ones.

Completion → Agent. Tab completions cost fractions of a cent. Agentic sessions cost dollars. The industry is pushing hard toward agents. GitHub's 2025 data: 43 million PRs merged (up 23% YoY), roughly 1 billion commits (up 25%), 557,000 new App Store apps (up 24%). More efficient tools producing more output, not less. Jevons, exactly.

Tool → Platform. Cursor at a $50B valuation isn't a text editor anymore. Claude Code at $2.5B run-rate isn't an autocomplete feature. These are platforms — and platforms expand scope. More scope, more inference, more spending.

The top five hyperscalers have committed nearly $700 billion in AI infrastructure spending for 2026, close to double what they spent in 2025. Goldman Sachs projects $1.15 trillion in cumulative AI infrastructure investment from 2025-2027. Gartner pegs global AI spending above $2.5 trillion in 2026.

None of this is irrational. The tools demonstrably pay for themselves. Developers save hours per week. Code ships faster. The economics work at the unit level — $200-500/month in tools against $12,500+/month in loaded developer cost is a no-brainer.

But Jevons' point was never that efficiency is bad. His point was that efficiency doesn't reduce total consumption. It increases it. Every efficiency gain unlocks new use cases, new workflows, new demands. The bill doesn't shrink. It finds new places to grow.

In 1865, the answer was: burn more coal. In 2026, the answer is: burn more tokens. The paradox isn't a bug. It's how markets work.