Anthropic sells tokens. Claude Code burns them.
Before you type a single character, Claude Code has already consumed 30,000 to 50,000 tokens loading system prompts, tool definitions, MCP server schemas, memory files, and project instructions. On a 200,000-token context window, that's up to a quarter of your capacity gone before the conversation begins.
This is not a bug. It's the central design bet of the tool that 46% of developers call their most loved — the highest ranking of any AI coding tool in the Pragmatic Engineer's 2025 developer survey. Every piece of that overhead exists because someone decided that spending tokens is cheaper than getting things wrong.
I've spent two sessions tracing Claude Code's architecture through official documentation, the system prompt archive tracking 131 versions, reverse-engineering analyses, and structural deep dives. What emerges isn't just a technical spec. It's a theory of how AI tools should work — expressed in engineering tradeoffs that consistently choose the same thing.
The thesis: Claude Code won by taxing itself for reliability. And the architecture proves it.
Where Your Tokens Go
Every Claude Code session starts by building a context payload. Here's what fills your window before you've said a word:
The system prompt alone is modest — roughly 3,000 tokens, or 1.5% of the standard 200K context window. But it's the tip of the iceberg. Eighteen built-in tool definitions add ~1,400 tokens. Each MCP server you've configured loads its full tool schema — another 2,000+ tokens per server. Memory files (CLAUDE.md, auto-memories, project context) can consume up to 24,000 tokens. Custom agent configurations add up to 20,000 more.
Why tolerate this? Because every token of overhead prevents a category of failure. The system prompt prevents unsafe behavior. Tool definitions prevent malformed calls. Memory files prevent Claude from forgetting your project's conventions. MCP schemas prevent protocol errors. Each is a tax paid upfront to avoid a costlier recovery later.
Seven Bets
Claude Code's architecture is a series of wagers. Each one trades something measurable — tokens, speed, capability headroom — for something harder to quantify: correctness, reliability, user trust. Here's what the engineering actually chose.
1. The Instruction Tax
CLAUDE.md — the project instruction file — doesn't load once per session. It reloads on every single message. When context compaction triggers, CLAUDE.md is re-read from disk and re-injected fresh, overriding any compacted summaries that might have drifted.
This is expensive. A well-written CLAUDE.md adds up to 2,000 tokens per turn. Over a 50-turn session, that's 100,000 tokens spent repeating the same instructions. But the alternative is a compacted summary that subtly diverges from actual project rules, producing code that violates conventions the user explicitly wrote down.
The hierarchy matters too. Enterprise managed rules override user-level, which override project-level, which override local. This isn't just configuration — it's a trust model encoded in load order. An enterprise admin's rules cannot be overridden by a project's CLAUDE.md, no matter what it says.
2. Three Memory Layers
Claude Code maintains three distinct memory systems, each optimized for a different information lifetime:
Session memory — the current context window. Rich, detailed, temporary. Lost when the session ends. This is where active reasoning happens.
CLAUDE.md — manually maintained project instructions. Persistent across sessions, human-edited, reloaded every message. The source of truth for "how things work here."
Auto-memory — notes Claude saves for itself at ~/.claude/projects/<project>/memory/. The first 200 lines auto-load into context; topic-specific files (debugging conventions, API patterns) are consulted on-demand. The /remember command reviews session learnings and proposes updates.
Three layers is more complex than one. But each fails differently. Session memory is precise but ephemeral. CLAUDE.md is durable but requires human maintenance. Auto-memory bridges the gap by letting the tool learn from its own experience. The bet: different kinds of knowledge need different storage characteristics — and no single layer can serve all three purposes without compromising at least one.
3. The Isolation Principle
Claude Code can spawn subagents — separate Claude instances with their own context windows. Three types, each routed to a different model:
Two hard constraints define the system. Subagents cannot spawn their own subagents. No nesting. This prevents exponential token consumption — an unconstrained chain of agent-spawning-agent could burn millions of tokens with no ceiling. Each subagent gets a fresh context window, and only the final message returns to the parent. This is a deliberate information bottleneck: it forces summarization, keeping the parent's context clean.
The model routing is the most revealing decision. The Explore agent runs on Haiku — the cheapest, fastest model — because context isolation means you don't need Opus-level reasoning to find files and read code. The search results matter, not the search process. Multi-agent workflows carry a 4-7x token multiplier, but the alternative — cramming everything into one context — leads to pollution that degrades the main agent's reasoning on the work that actually requires it.
4. The Compiler in the Loop
Since v2.0.74 (December 2025), Claude Code runs a Language Server Protocol integration covering 11 programming languages with five operations: goToDefinition, findReferences, hover, documentSymbol, getDiagnostics.
The real feature isn't navigation speed — though 50ms LSP lookup vs. 45 seconds of text search matters. It's the self-correction loop. After every file edit, the language server analyzes changes and reports errors back. Type mismatches, missing imports, syntax errors — Claude sees them immediately and fixes them in the same turn, before the user knows there was a problem.
This turns Claude Code from a text generator with tools into something closer to an IDE-aware agent. It catches the class of errors that erode trust in AI-generated code: the imports that don't exist, the types that don't match, the methods that were renamed three commits ago. Third-party LSP plugins now extend coverage to 20+ languages.
5. The Graceful Cliff
Context windows have hard limits. Claude Code's compaction system triggers at 83.5% capacity — roughly 167,000 tokens on a 200K window. It compresses conversation history while preserving CLAUDE.md (re-read from disk), active tool states, and recent decisions.
Compaction loses precision. Variable names blur. Exact error messages fade. Early reasoning trails disappear. But the alternative is worse: hit 100% and the session fails entirely.
The buffer was tuned down from 45,000 tokens to ~33,000 — a bet that more usable space matters more than a larger safety margin. With the 1M context window now available for Opus 4.6 on Max/Team/Enterprise plans, usable space extends to ~830,000 tokens before compaction. You can override the threshold with CLAUDE_AUTOCOMPACT_PCT_OVERRIDE. That this is an environment variable rather than a settings toggle tells you who it's for: operators who understand the tradeoff, not users who should never need to think about it.
6. The Affordability Hack
If Claude Code burns 30-50K tokens per turn in overhead, shouldn't it be absurdly expensive? It would be — without prompt caching.
Claude Code places cache breakpoints on the system prompt, tool definitions, CLAUDE.md, and conversation history. Cached token reads cost 10% of the base price — a 90% discount. Cache writes cost 125%. The math breaks even after two requests, then compounds in your favor for every subsequent turn.
The system prompt (~2,700 tokens) is cached once and reused across turns. CLAUDE.md caches per-project. Conversation history caches incrementally. The overhead that looks like a tax on paper becomes nearly free in practice — as long as you're past your second message.
This is the engineering that makes the reliability thesis viable. Without caching, the token-burning architecture would be economically irrational. With it, reliability and affordability aren't in conflict — they're co-optimized through infrastructure. The expensive thing isn't the overhead. It's the first overhead. Everything after that is a cache hit.
7. The Governance Layer
Claude Code's hooks system exposes 12+ lifecycle events: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PermissionRequest, Notification, Stop, SubagentStart, SubagentStop, PreCompact, PostCompact, and more.
Three handler types: command (shell script), prompt (Haiku classifier for lightweight decisions), and agent (multi-turn verification). PreToolUse hooks can modify tool inputs — not just block them. Exit code 0 means proceed; exit code 2 means block, with stderr fed back as the reason.
This is governance built into the runtime, not bolted on after deployment. The hooks system treats "what can the agent do?" as a first-class architectural concern. In an industry where 90% of companies use AI but only 20% govern it, building the governance hooks before the governance policies is a bet that the infrastructure has to precede the process.
The permission system reinforces this with four modes and a strict deny → ask → allow evaluation order. The auto mode — where a classifier decides safety — reads only from user settings, not shared project settings. This prevents a compromised project from escalating its own permissions. Security encoded in architecture, not policy.
What the Tax Buys
| Decision | What It Costs | What It Prevents |
|---|---|---|
| CLAUDE.md reloads every turn | ~2K tokens/turn | Instruction drift after compaction |
| Three memory layers | System complexity | Forgotten conventions, lost learnings |
| Subagents can't nest | Capability ceiling | Exponential token consumption |
| LSP after every edit | Processing time per edit | Broken imports, type errors, stale references |
| Compaction at 83.5% | ~33K reserved tokens | Hard session failures at context limit |
| Aggressive prompt caching | 125% on first turn | Overhead becoming cost-prohibitive |
| 12+ hook lifecycle events | Runtime overhead | Ungovernable agent behavior |
The through-line: Claude Code consistently spends resources now to prevent failures later. Tokens for instruction reliability. Memory layers for knowledge persistence. Context isolation for reasoning clarity. LSP for code correctness. Caching for economic viability. Hooks for organizational control.
This philosophy has a name in engineering: front-loading failure prevention. It's the same principle behind type systems (catch errors at compile time, not runtime), code review (catch issues before merge, not after deployment), and infrastructure-as-code (declare desired state, don't debug drift). The cost is visible and constant. The benefit is invisible — failures that never happen.
It also explains a benchmark gap. Claude Code scores 8 percentage points higher than OpenCode running the same underlying model on complex tasks. The model is identical. The scaffolding is different. The scaffolding is the product.
The Market Logic
There's a deeper game at work. Anthropic's $200/month Max plan costs the company roughly $5,000/month in actual compute — a 25x per-user subsidy. They're not optimizing for per-token revenue on power users. They're optimizing for adoption and the network effects of developers building workflows around Claude Code's specific architecture: CLAUDE.md files, hook configurations, MCP integrations, plugin ecosystems.
The plugin marketplace now lists 101 entries — 33 built by Anthropic, 68 by partners including GitHub, Playwright, Supabase, Figma, Vercel, Linear, Sentry, and Stripe. Each integration raises switching costs. Each CLAUDE.md file represents institutional knowledge encoded in Anthropic's format. The reliability tax isn't just technical architecture — it's the product moat.
For the AI coding tools market — $12.8 billion and growing — Claude Code's architecture makes a clear argument. The model wars are over: the top five SWE-bench scores sit within 0.9% of each other. The engineering wars have begun. The tool that wins won't have the best model. It'll be the one that made the best bets about how humans and AI actually work together.
Claude Code bet on reliability. So far, it's paying off.
For a broader walkthrough of Claude Code's subsystems, see my earlier piece: How Claude Code Actually Works: A Systems-Level Deep Dive.