How Claude Code Actually Works: A Systems-Level Deep Dive

Claude Code has 82,000+ GitHub stars and handles millions of coding sessions. But most developers treat it as a black box — type a prompt, get code back. What's actually happening between your prompt and the response?

I spent the last few hours pulling apart the architecture. The source code is distributed as obfuscated JavaScript via npm (the GitHub repo contains only issues and releases), but multiple researchers have reverse-engineered the internals through mitmproxy interception, npm tarball analysis, and systematic prompt extraction. Here's what the machine actually does.

The Stack

Four layers matter most: context assembly (how your project knowledge reaches the model), the agentic loop (how Claude decides what to do next), MCP (how external tools connect), and plugins (how all of these get packaged and distributed). Let's take each one apart.

CLAUDE.md: Not Where You Think It Lives

The most common misconception: CLAUDE.md goes into the system prompt. It doesn't.

CLAUDE.md content is injected into user messages, wrapped in <system-reminder> XML tags. Every turn. Not once at session start — every single API call re-sends it. You can verify this yourself: look at the top of any Claude Code conversation's API payload and you'll see your CLAUDE.md wrapped in exactly this format.

Why user messages, not the system prompt?

The system prompt already contains ~50 core instructions across 110+ conditional strings. Adding per-project customization on top would make it unwieldy and break prompt caching. By placing CLAUDE.md in user messages, Claude Code keeps the system prompt stable (and cacheable — 92% prefix reuse) while allowing project-specific context. The trade-off: CLAUDE.md consumes context tokens every turn. One forensic analysis found hidden system-reminder injections consuming 15-50% of the context window in long sessions.

The Loading Hierarchy

CLAUDE.md isn't a single file. It's a layered system with additive precedence:

Location	Scope	Loaded When
~/.claude/CLAUDE.md	All projects	Session start
./CLAUDE.md	This project (shared)	Session start
./CLAUDE.local.md	This project (personal)	Session start
parent dirs CLAUDE.md	Monorepo-level	Session start
subdir/CLAUDE.md	Subdirectory-scoped	On-demand
.claude/rules/*.md	Conditional (glob-matched)	When matching files active

All tiers are additive — they stack, they don't replace. When instructions conflict, more specific (local) wins. Since v1.0.16, files in .claude/rules/ support YAML frontmatter for conditional loading — a rule with match: "*.test.ts" only loads when Claude works on test files. In practice, 5 conditional rule files of 30 lines each achieve 96% compliance, compared to 92% for a single 150-line CLAUDE.md.

Memory: Three Systems, Not One

Developers often confuse CLAUDE.md with "Claude's memory." There are actually three distinct persistence mechanisms, each with different durability guarantees:

1. CLAUDE.md

You write it. Loaded in full at session start. Survives compaction (re-read from disk). Advisory — Claude tries to follow it but may deprioritize instructions it deems irrelevant.

Files <200 lines: 92% compliance
Files >400 lines: 71% compliance

2. Auto Memory

Claude writes it itself. Build commands, debugging insights, architecture notes. Stored at ~/.claude/projects/<project>/memory/. Browsable via /memory.

First 200 lines of MEMORY.md loaded at start.
Topic files read on-demand only.

3. Session Memory

Background summaries written automatically. "Recalled X memories" at startup. Each session gets a session-memory/summary.md.

Lossy by nature.
Requires --continue/--resume for full history.

The critical insight: CLAUDE.md is the only memory that's truly reliable. Auto memory truncates after 200 lines. Session memory is lossy. Conversation history gets compressed. If something must persist across sessions, write it to CLAUDE.md or a file on disk.

The Agentic Loop

The core architecture that makes Claude Code an "agent" rather than a chatbot is a single-threaded loop. It is, as the official documentation puts it, "deceptively simple":

// The entire agentic architecture, simplified
while (true) {
  response = claude.send(systemPrompt + tools + history + userInput)
  
  if (response.stop_reason === "end_turn") break
  // stop_reason === "tool_use" — execute and loop
  for (toolCall of response.tool_calls) {
    result = executeTool(toolCall)  // Read, Edit, Bash, etc.
    history.push({ role: "tool_result", content: result })
  }
}

The model receives context, produces a response. If the response contains tool calls, execute them, append results to history, call the model again. If stop_reason is "end_turn", stop. This is the only reliable stopping signal — the docs explicitly warn against text parsing or iteration caps as primary mechanisms.

But between each iteration, the CLI does significant work:

Permission enforcement: Hooks fire first (PreToolUse can block with exit code 2), then deny rules, allow rules, ask rules, then the permission mode applies
Context monitoring: Auto-compaction triggers at ~83.5% of the 200K window
State re-injection: CLAUDE.md and system reminders re-sent every turn
Mid-task steering: Users can inject new instructions during execution via an asynchronous dual-buffer queue
Progress tracking: TodoWrite state injected via system message reminders

Context Compression: The Lossy Bargain

When the context window hits ~167K tokens (83.5% of 200K), Claude Code triggers auto-compaction. It generates a summary, wraps it in <summary> tags, drops all prior messages, and continues from the compressed context. The payload shrinks by roughly 85% — from 167K tokens to ~25K.

But compaction is lossy, and the losses are asymmetric:

SURVIVES

Broad topic structure
General conclusions and direction
Recent exchanges (last 10-15 messages)
Explicit decisions
CLAUDE.md (re-read from disk)

LOST

Exact numbers and code snippets
Specific variable names and file paths
Nuanced reasoning chains
Error messages and stack traces
Multiple code iterations

The fix: run /compact manually at logical breakpoints with preservation instructions (/compact Preserve all file paths, error messages, and modified files). Or use subagents for heavy search tasks — their intermediate results stay in their own context window and never consume the parent's budget. The environment variable CLAUDE_AUTOCOMPACT_PCT_OVERRIDE lets you change the trigger threshold.

Skills: Progressive Disclosure at Scale

Skills are one layer of Claude Code's extensibility system — a unified framework that merged the older commands mechanism into something richer. The part most people miss: skills use progressive disclosure to stay context-efficient.

At startup, Claude Code scans .claude/skills/ (project) and ~/.claude/skills/ (global). But it only loads the name and description from each SKILL.md's YAML frontmatter — roughly 100 tokens per skill. These go into an <available_skills> block attached to the Skill tool definition. The full skill content stays on disk.

When a skill is invoked — either by the user typing /skill-name or by Claude deciding it's relevant — the Skill tool loads the full SKILL.md body and injects it into the conversation as a tool response. Only then does Claude see the full playbook.

How skill routing works

There is no algorithmic routing, no intent classification, no embeddings. The decision happens inside Claude's forward pass through the transformer. Claude reads the skill descriptions and decides which one is relevant based on conversation context. Reverse engineering confirms that skill routing is pure model inference — the same mechanism that decides which tool to call. Skills are instructions, not separate processes. They guide Claude's behavior within the main conversation using its standard tools.

Skills can include supporting files (scripts, templates), restrict available tools via allowed-tools, or spawn subagents. The built-in skills — /simplify, /review, /batch, /loop, /debug — are prompt-based, not hardcoded. They work identically to user-created skills. This is distinct from built-in slash commands like /clear and /compact, which are deterministic CLI operations that never touch the model.

Plugins: The Packaging Layer

Skills, MCP servers, hooks, subagents — Claude Code has all these extensibility primitives, but until recently they were loose files you had to configure yourself. Plugins are the packaging and distribution layer that bundles them into installable, shareable units. Think npm packages, but for Claude Code capabilities.

A plugin is a directory with a .claude-plugin/plugin.json manifest (only name is required) and any combination of six component types:

my-plugin/
├── .claude-plugin/
│   └── plugin.json           # Manifest (only file that goes here)
├── skills/                   # SKILL.md files — model-invoked behaviors
├── agents/                   # Subagent definitions (.md with YAML frontmatter)
├── commands/                 # Slash commands (legacy, prefer skills/)
├── hooks/
│   └── hooks.json            # Event handlers (20+ lifecycle events)
├── .mcp.json                 # MCP server configs (auto-start on enable)
├── .lsp.json                 # Language Server Protocol configs
└── settings.json             # Default settings (currently: agent key only)

Every component maps directly to an existing Claude Code primitive — a plugin's skills/ directory works identically to .claude/skills/, its .mcp.json is the same format as a project's. What plugins add is namespacing (a skill hello in plugin my-plugin becomes /my-plugin:hello), version management, and distribution.

Installation and Scoping

Plugins install from marketplaces — curated registries that are essentially JSON catalogs pointing to plugin sources:

# Install from the official Anthropic marketplace /plugin install feature-dev@claude-plugins-official

# Load a local plugin for development claude --plugin-dir ./my-plugin

# Hot-reload during development /reload-plugins

Installation scopes mirror the CLAUDE.md hierarchy: user (~/.claude/settings.json, personal, default), project (.claude/settings.json, version-controlled), local (.claude/settings.local.json, gitignored), and managed (enterprise admin-controlled, read-only). Marketplace plugins are copied to ~/.claude/plugins/cache rather than used in-place — plugins cannot reference files outside their own directory.

The Agent Override

The most powerful plugin feature is the settings.json agent key. A plugin can include a custom agent definition (in its agents/ directory) and set it as the main thread agent. When enabled, that agent's system prompt, tool restrictions, and model selection replace Claude Code's defaults. This is how "mode" plugins work — a security-review plugin can make Claude Code behave as a security auditor by default, with its own system prompt and tool access.

The Ecosystem

The official Anthropic marketplace ships built-in and is automatically available. It includes official integrations for GitHub, Linear, Notion, Figma, Vercel, Sentry, and more. The broader ecosystem has grown to over 9,000 plugins across registries.

Some notable numbers:

feature-dev (89,000+ installs) — seven-phase workflow from requirements gathering through documentation
code-review — runs 4-5 independent Sonnet agents in parallel for multi-angle code analysis
LSP plugins for 11+ languages — Python, TypeScript, Rust, Go, C/C++, C#, Java, Kotlin, and more
security-guidance — PreToolUse hook that monitors 9 security patterns (command injection, XSS, eval, etc.)

Trust model: there isn't one

Plugins execute with full user privileges. No sandbox. The official docs say it plainly: "Anthropic does not control what MCP servers, files, or other software are included in plugins and cannot verify that they will work as intended." Enterprise admins can lock down allowed marketplaces via strictKnownMarketplaces in managed settings, and managed-scope plugins can be force-enabled. But for individual users, installing a plugin is an act of trust — same as running npm install.

Architecturally, plugins are a thin layer. They don't introduce new runtime primitives — every capability maps to something that already existed. What they solve is distribution: the ability to share a curated combination of skills, agents, hooks, and MCP servers as a single installable unit, with namespacing to prevent conflicts and versioning for updates. It's the difference between "copy these 6 files into the right directories" and /plugin install.

No Indexing. No Embeddings. Just Search.

This one surprised me. Claude Code deliberately does not pre-index your codebase or use vector embeddings. Boris Cherny, Claude Code's creator, confirmed on Hacker News:

"Early versions used RAG + a local vector db, but we found pretty quickly that agentic search generally works better."

The search hierarchy is three tools deep: Glob (file path pattern matching, near-zero token cost) → Grep (content search via ripgrep, regex-powered) → Read (full file load, 500-1,500 tokens per file, reserved for confirmed-relevant files). For deep exploration, Claude spawns an Explore subagent running on Haiku in an isolated context window — the subagent searches, reads, and reasons, but only its summary returns to the parent.

Why no embeddings? Code symbols are exact — getUserById is not semantically close to fetchAccountDetails the way embeddings suggest. Indexes drift during active editing. Embeddings leak code information even as vectors. And there's zero setup friction — no indexing delay when switching projects. The trade-off: Grep is literal/regex only. If you need conceptual search ("find the authentication logic"), Claude has to reason about directory structure and file names. Usually works. Not always.

MCP: The Open Tool Protocol

The Model Context Protocol is how Claude Code talks to external services — databases, GitHub, browsers, custom APIs. Created by David Soria Parra and Justin Spahr-Summers at Anthropic in November 2024, MCP has since been adopted by OpenAI, Google, GitHub, and JetBrains. In December 2025, Anthropic donated it to the Linux Foundation.

The protocol uses JSON-RPC 2.0 with a three-step initialization handshake:

// 1. Client announces itself
→ { "method": "initialize", "params": {
    "protocolVersion": "2025-03-26",
    "clientInfo": { "name": "claude-code" } }}

// 2. Server responds with capabilities
← { "result": { "capabilities": {
    "tools": { "listChanged": true } }}}
// 3. Client confirms, then discovers tools
→ { "method": "initialized" }
→ { "method": "tools/list" }
← { "result": { "tools": [{
    "name": "query_database",
    "description": "Run SQL queries",
    "inputSchema": { "type": "object", ... }
  }]}}

After handshake, MCP tools appear in the model's tool list identically to built-in tools. Claude doesn't know or care whether Read is native while query_database comes from an MCP server. Same calling convention, same JSON schemas.

Configuration

MCP servers are configured at three scope levels: local (~/.claude.json per project, private) > project (.mcp.json at repo root, version-controlled) > user (~/.claude.json globally). Two transports matter:

stdio (default, ~80% of use cases) — Claude Code spawns the server as a subprocess. JSON-RPC over stdin/stdout. Tool discovery takes <200ms. Perfect for local tools.
Streamable HTTP (March 2025) — single HTTP endpoint, supports SSE for streaming, OAuth 2.1 + PKCE for auth. Replaces the deprecated SSE transport. Use for remote services.

One critical performance detail: each enabled MCP server consumes context tokens for its tool definitions. A single server exposing 20-30 tools eats 4,000-6,000 tokens. When MCP tools exceed 10% of the context window, Claude Code enables Tool Search — dynamically loading only needed tool definitions per task, cutting context consumption by ~85%. Rule of thumb: disable servers you aren't actively using.

Subagents: Isolated Context Windows

The Agent tool solves context bloat. Each subagent runs in its own fresh 200K-token window. Only the final message returns to the parent — all intermediate tool calls, file reads, and reasoning stay isolated.

The mechanism: parent calls Agent with a prompt string (the only channel — no parent context carries over). The subagent gets its own system prompt, runs tools, and returns results. Subagents cannot spawn their own subagents — one level of nesting only, which prevents infinite recursion. Three built-in types: Explore (read-only, Haiku model for speed), Plan (research for plan mode), and general-purpose (full tool access). Multiple subagents can run in parallel.

Hooks: The Deterministic Layer

Everything else in Claude Code is probabilistic — the model tries to follow CLAUDE.md, decides which skills to invoke, chooses which tools to call. Hooks are the deterministic escape hatch.

Hooks are shell commands that fire on lifecycle events — PreToolUse (exit 2 to block), PostToolUse (auto-format, auto-test), SessionStart/Stop, Notification, SubagentStop. They run with full user permissions, no sandbox. Configured in .claude/settings.json (shared) or .claude/settings.local.json (personal).

The key distinction: CLAUDE.md rules achieve ~92% compliance. Hooks execute 100% of the time their conditions are met. If a rule must never be broken — blocking secrets from being committed, running formatters on every save, preventing force-pushes — use a hook.

Putting It All Together

When you type a prompt in Claude Code, here's the full sequence:

Plugin & context assembly: Enabled plugins are loaded from cache — their skills, agents, hooks, and MCP servers merge into the session. 110+ conditional prompt strings selected by environment/config. CLAUDE.md files loaded from the full hierarchy and injected as <system-reminder> tags. Auto-memory's first 200 lines included. Skill descriptions (from plugins and local .claude/skills/) attached to the Skill tool. MCP tool schemas (from plugins and project config) merged with 18 built-in tool definitions.
API call: Assembled context hits the Anthropic Messages API with prompt caching (92% prefix reuse). System prompt is static and cached; CLAUDE.md rides in user messages.
Agentic loop: Claude responds. If stop_reason === "tool_use": hooks fire (PreToolUse → permissions → execute → PostToolUse), results append to history, loop continues. If stop_reason === "end_turn": response displayed, loop exits.
Context management: If history exceeds ~83.5% of 200K, auto-compaction summarizes older messages. CLAUDE.md survives because it's re-read from disk every turn.

The architecture is less magic than it appears. A well-engineered loop with smart context management, a layered memory system with clear durability trade-offs, and an open protocol for extensibility — now wrapped in a plugin system that makes it all shareable. The model itself — Opus, Sonnet, or Haiku — is deliberately interchangeable. Claude Code's value is in the scaffolding, not the model. Which, as I've written before, is the real lesson of the AI coding era: the model is the commodity; the agent is the product.

Sources: How Claude Code Works · Memory Docs · Skills Docs · Plugins Docs · Plugins Reference · MCP Docs · Hooks Reference · Permissions · System Prompts (Piebald-AI) · MCP Spec · Agent Loop Docs · Reverse Engineering (Shatrov) · Poking Around Claude Code (Chung) · Inside Skills (Shilkov) · No Indexing (Vadim) · Official Plugin Directory