Claude Code has 82,000+ GitHub stars and handles millions of coding sessions. But most developers treat it as a black box — type a prompt, get code back. What's actually happening between your prompt and the response?
I spent the last few hours pulling apart the architecture. The source code is distributed as obfuscated JavaScript via npm (the GitHub repo contains only issues and releases), but multiple researchers have reverse-engineered the internals through mitmproxy interception, npm tarball analysis, and systematic prompt extraction. Here's what the machine actually does.
The Stack
Four layers matter most: context assembly (how your project knowledge reaches the model), the agentic loop (how Claude decides what to do next), MCP (how external tools connect), and plugins (how all of these get packaged and distributed). Let's take each one apart.
CLAUDE.md: Not Where You Think It Lives
The most common misconception: CLAUDE.md goes into the system prompt. It doesn't.
CLAUDE.md content is injected into user messages, wrapped in <system-reminder> XML tags. Every turn. Not once at session start — every single API call re-sends it. You can verify this yourself: look at the top of any Claude Code conversation's API payload and you'll see your CLAUDE.md wrapped in exactly this format.
Why user messages, not the system prompt?
The system prompt already contains ~50 core instructions across 110+ conditional strings. Adding per-project customization on top would make it unwieldy and break prompt caching. By placing CLAUDE.md in user messages, Claude Code keeps the system prompt stable (and cacheable — 92% prefix reuse) while allowing project-specific context. The trade-off: CLAUDE.md consumes context tokens every turn. One forensic analysis found hidden system-reminder injections consuming 15-50% of the context window in long sessions.
The Loading Hierarchy
CLAUDE.md isn't a single file. It's a layered system with additive precedence:
| Location | Scope | Loaded When |
|---|---|---|
| ~/.claude/CLAUDE.md | All projects | Session start |
| ./CLAUDE.md | This project (shared) | Session start |
| ./CLAUDE.local.md | This project (personal) | Session start |
| parent dirs CLAUDE.md | Monorepo-level | Session start |
| subdir/CLAUDE.md | Subdirectory-scoped | On-demand |
| .claude/rules/*.md | Conditional (glob-matched) | When matching files active |
All tiers are additive — they stack, they don't replace. When instructions conflict, more specific (local) wins. Since v1.0.16, files in .claude/rules/ support YAML frontmatter for conditional loading — a rule with match: "*.test.ts" only loads when Claude works on test files. In practice, 5 conditional rule files of 30 lines each achieve 96% compliance, compared to 92% for a single 150-line CLAUDE.md.
Memory: Three Systems, Not One
Developers often confuse CLAUDE.md with "Claude's memory." There are actually three distinct persistence mechanisms, each with different durability guarantees:
1. CLAUDE.md
You write it. Loaded in full at session start. Survives compaction (re-read from disk). Advisory — Claude tries to follow it but may deprioritize instructions it deems irrelevant.
Files <200 lines: 92% compliance
Files >400 lines: 71% compliance
2. Auto Memory
Claude writes it itself. Build commands, debugging insights, architecture notes. Stored at ~/.claude/projects/<project>/memory/. Browsable via /memory.
First 200 lines of MEMORY.md loaded at start.
Topic files read on-demand only.
3. Session Memory
Background summaries written automatically. "Recalled X memories" at startup. Each session gets a session-memory/summary.md.
Lossy by nature.
Requires --continue/--resume for full history.
The critical insight: CLAUDE.md is the only memory that's truly reliable. Auto memory truncates after 200 lines. Session memory is lossy. Conversation history gets compressed. If something must persist across sessions, write it to CLAUDE.md or a file on disk.
The Agentic Loop
The core architecture that makes Claude Code an "agent" rather than a chatbot is a single-threaded loop. It is, as the official documentation puts it, "deceptively simple":
// The entire agentic architecture, simplified while (true) { response = claude.send(systemPrompt + tools + history + userInput)if (response.stop_reason === "end_turn") break
// stop_reason === "tool_use" — execute and loop for (toolCall of response.tool_calls) { result = executeTool(toolCall) // Read, Edit, Bash, etc. history.push({ role: "tool_result", content: result }) } }
The model receives context, produces a response. If the response contains tool calls, execute them, append results to history, call the model again. If stop_reason is "end_turn", stop. This is the only reliable stopping signal — the docs explicitly warn against text parsing or iteration caps as primary mechanisms.
But between each iteration, the CLI does significant work:
- Permission enforcement: Hooks fire first (PreToolUse can block with exit code 2), then deny rules, allow rules, ask rules, then the permission mode applies
- Context monitoring: Auto-compaction triggers at ~83.5% of the 200K window
- State re-injection: CLAUDE.md and system reminders re-sent every turn
- Mid-task steering: Users can inject new instructions during execution via an asynchronous dual-buffer queue
- Progress tracking: TodoWrite state injected via system message reminders
Context Compression: The Lossy Bargain
When the context window hits ~167K tokens (83.5% of 200K), Claude Code triggers auto-compaction. It generates a summary, wraps it in <summary> tags, drops all prior messages, and continues from the compressed context. The payload shrinks by roughly 85% — from 167K tokens to ~25K.
But compaction is lossy, and the losses are asymmetric:
SURVIVES
- Broad topic structure
- General conclusions and direction
- Recent exchanges (last 10-15 messages)
- Explicit decisions
- CLAUDE.md (re-read from disk)
LOST
- Exact numbers and code snippets
- Specific variable names and file paths
- Nuanced reasoning chains
- Error messages and stack traces
- Multiple code iterations
The fix: run /compact manually at logical breakpoints with preservation instructions (/compact Preserve all file paths, error messages, and modified files). Or use subagents for heavy search tasks — their intermediate results stay in their own context window and never consume the parent's budget. The environment variable CLAUDE_AUTOCOMPACT_PCT_OVERRIDE lets you change the trigger threshold.
Skills: Progressive Disclosure at Scale
Skills are one layer of Claude Code's extensibility system — a unified framework that merged the older commands mechanism into something richer. The part most people miss: skills use progressive disclosure to stay context-efficient.
At startup, Claude Code scans .claude/skills/ (project) and ~/.claude/skills/ (global). But it only loads the name and description from each SKILL.md's YAML frontmatter — roughly 100 tokens per skill. These go into an <available_skills> block attached to the Skill tool definition. The full skill content stays on disk.
When a skill is invoked — either by the user typing /skill-name or by Claude deciding it's relevant — the Skill tool loads the full SKILL.md body and injects it into the conversation as a tool response. Only then does Claude see the full playbook.
How skill routing works
There is no algorithmic routing, no intent classification, no embeddings. The decision happens inside Claude's forward pass through the transformer. Claude reads the skill descriptions and decides which one is relevant based on conversation context. Reverse engineering confirms that skill routing is pure model inference — the same mechanism that decides which tool to call. Skills are instructions, not separate processes. They guide Claude's behavior within the main conversation using its standard tools.
Skills can include supporting files (scripts, templates), restrict available tools via allowed-tools, or spawn subagents. The built-in skills — /simplify, /review, /batch, /loop, /debug — are prompt-based, not hardcoded. They work identically to user-created skills. This is distinct from built-in slash commands like /clear and /compact, which are deterministic CLI operations that never touch the model.
Plugins: The Packaging Layer
Skills, MCP servers, hooks, subagents — Claude Code has all these extensibility primitives, but until recently they were loose files you had to configure yourself. Plugins are the packaging and distribution layer that bundles them into installable, shareable units. Think npm packages, but for Claude Code capabilities.
A plugin is a directory with a .claude-plugin/plugin.json manifest (only name is required) and any combination of six component types:
my-plugin/ ├── .claude-plugin/ │ └── plugin.json # Manifest (only file that goes here) ├── skills/ # SKILL.md files — model-invoked behaviors ├── agents/ # Subagent definitions (.md with YAML frontmatter) ├── commands/ # Slash commands (legacy, prefer skills/) ├── hooks/ │ └── hooks.json # Event handlers (20+ lifecycle events) ├── .mcp.json # MCP server configs (auto-start on enable) ├── .lsp.json # Language Server Protocol configs └── settings.json # Default settings (currently: agent key only)
Every component maps directly to an existing Claude Code primitive — a plugin's skills/ directory works identically to .claude/skills/, its .mcp.json is the same format as a project's. What plugins add is namespacing (a skill hello in plugin my-plugin becomes /my-plugin:hello), version management, and distribution.
Installation and Scoping
Plugins install from marketplaces — curated registries that are essentially JSON catalogs pointing to plugin sources:
# Install from the official Anthropic marketplace /plugin install feature-dev@claude-plugins-official# Load a local plugin for development claude --plugin-dir ./my-plugin
# Hot-reload during development /reload-plugins
Installation scopes mirror the CLAUDE.md hierarchy: user (~/.claude/settings.json, personal, default), project (.claude/settings.json, version-controlled), local (.claude/settings.local.json, gitignored), and managed (enterprise admin-controlled, read-only). Marketplace plugins are copied to ~/.claude/plugins/cache rather than used in-place — plugins cannot reference files outside their own directory.
The Agent Override
The most powerful plugin feature is the settings.json agent key. A plugin can include a custom agent definition (in its agents/ directory) and set it as the main thread agent. When enabled, that agent's system prompt, tool restrictions, and model selection replace Claude Code's defaults. This is how "mode" plugins work — a security-review plugin can make Claude Code behave as a security auditor by default, with its own system prompt and tool access.
The Ecosystem
The official Anthropic marketplace ships built-in and is automatically available. It includes official integrations for GitHub, Linear, Notion, Figma, Vercel, Sentry, and more. The broader ecosystem has grown to over 9,000 plugins across registries.
Some notable numbers:
feature-dev(89,000+ installs) — seven-phase workflow from requirements gathering through documentationcode-review— runs 4-5 independent Sonnet agents in parallel for multi-angle code analysis- LSP plugins for 11+ languages — Python, TypeScript, Rust, Go, C/C++, C#, Java, Kotlin, and more
security-guidance— PreToolUse hook that monitors 9 security patterns (command injection, XSS, eval, etc.)
Trust model: there isn't one
Plugins execute with full user privileges. No sandbox. The official docs say it plainly: "Anthropic does not control what MCP servers, files, or other software are included in plugins and cannot verify that they will work as intended." Enterprise admins can lock down allowed marketplaces via strictKnownMarketplaces in managed settings, and managed-scope plugins can be force-enabled. But for individual users, installing a plugin is an act of trust — same as running npm install.
Architecturally, plugins are a thin layer. They don't introduce new runtime primitives — every capability maps to something that already existed. What they solve is distribution: the ability to share a curated combination of skills, agents, hooks, and MCP servers as a single installable unit, with namespacing to prevent conflicts and versioning for updates. It's the difference between "copy these 6 files into the right directories" and /plugin install.
No Indexing. No Embeddings. Just Search.
This one surprised me. Claude Code deliberately does not pre-index your codebase or use vector embeddings. Boris Cherny, Claude Code's creator, confirmed on Hacker News:
"Early versions used RAG + a local vector db, but we found pretty quickly that agentic search generally works better."
The search hierarchy is three tools deep: Glob (file path pattern matching, near-zero token cost) → Grep (content search via ripgrep, regex-powered) → Read (full file load, 500-1,500 tokens per file, reserved for confirmed-relevant files). For deep exploration, Claude spawns an Explore subagent running on Haiku in an isolated context window — the subagent searches, reads, and reasons, but only its summary returns to the parent.
Why no embeddings? Code symbols are exact — getUserById is not semantically close to fetchAccountDetails the way embeddings suggest. Indexes drift during active editing. Embeddings leak code information even as vectors. And there's zero setup friction — no indexing delay when switching projects. The trade-off: Grep is literal/regex only. If you need conceptual search ("find the authentication logic"), Claude has to reason about directory structure and file names. Usually works. Not always.
MCP: The Open Tool Protocol
The Model Context Protocol is how Claude Code talks to external services — databases, GitHub, browsers, custom APIs. Created by David Soria Parra and Justin Spahr-Summers at Anthropic in November 2024, MCP has since been adopted by OpenAI, Google, GitHub, and JetBrains. In December 2025, Anthropic donated it to the Linux Foundation.
The protocol uses JSON-RPC 2.0 with a three-step initialization handshake:
// 1. Client announces itself → { "method": "initialize", "params": { "protocolVersion": "2025-03-26", "clientInfo": { "name": "claude-code" } }}// 2. Server responds with capabilities ← { "result": { "capabilities": { "tools": { "listChanged": true } }}}
// 3. Client confirms, then discovers tools → { "method": "initialized" } → { "method": "tools/list" } ← { "result": { "tools": [{ "name": "query_database", "description": "Run SQL queries", "inputSchema": { "type": "object", ... } }]}}
After handshake, MCP tools appear in the model's tool list identically to built-in tools. Claude doesn't know or care whether Read is native while query_database comes from an MCP server. Same calling convention, same JSON schemas.
Configuration
MCP servers are configured at three scope levels: local (~/.claude.json per project, private) > project (.mcp.json at repo root, version-controlled) > user (~/.claude.json globally). Two transports matter:
- stdio (default, ~80% of use cases) — Claude Code spawns the server as a subprocess. JSON-RPC over stdin/stdout. Tool discovery takes <200ms. Perfect for local tools.
- Streamable HTTP (March 2025) — single HTTP endpoint, supports SSE for streaming, OAuth 2.1 + PKCE for auth. Replaces the deprecated SSE transport. Use for remote services.
One critical performance detail: each enabled MCP server consumes context tokens for its tool definitions. A single server exposing 20-30 tools eats 4,000-6,000 tokens. When MCP tools exceed 10% of the context window, Claude Code enables Tool Search — dynamically loading only needed tool definitions per task, cutting context consumption by ~85%. Rule of thumb: disable servers you aren't actively using.
Subagents: Isolated Context Windows
The Agent tool solves context bloat. Each subagent runs in its own fresh 200K-token window. Only the final message returns to the parent — all intermediate tool calls, file reads, and reasoning stay isolated.
The mechanism: parent calls Agent with a prompt string (the only channel — no parent context carries over). The subagent gets its own system prompt, runs tools, and returns results. Subagents cannot spawn their own subagents — one level of nesting only, which prevents infinite recursion. Three built-in types: Explore (read-only, Haiku model for speed), Plan (research for plan mode), and general-purpose (full tool access). Multiple subagents can run in parallel.
Hooks: The Deterministic Layer
Everything else in Claude Code is probabilistic — the model tries to follow CLAUDE.md, decides which skills to invoke, chooses which tools to call. Hooks are the deterministic escape hatch.
Hooks are shell commands that fire on lifecycle events — PreToolUse (exit 2 to block), PostToolUse (auto-format, auto-test), SessionStart/Stop, Notification, SubagentStop. They run with full user permissions, no sandbox. Configured in .claude/settings.json (shared) or .claude/settings.local.json (personal).
The key distinction: CLAUDE.md rules achieve ~92% compliance. Hooks execute 100% of the time their conditions are met. If a rule must never be broken — blocking secrets from being committed, running formatters on every save, preventing force-pushes — use a hook.
Putting It All Together
When you type a prompt in Claude Code, here's the full sequence:
- Plugin & context assembly: Enabled plugins are loaded from cache — their skills, agents, hooks, and MCP servers merge into the session. 110+ conditional prompt strings selected by environment/config. CLAUDE.md files loaded from the full hierarchy and injected as
<system-reminder>tags. Auto-memory's first 200 lines included. Skill descriptions (from plugins and local.claude/skills/) attached to the Skill tool. MCP tool schemas (from plugins and project config) merged with 18 built-in tool definitions. - API call: Assembled context hits the Anthropic Messages API with prompt caching (92% prefix reuse). System prompt is static and cached; CLAUDE.md rides in user messages.
- Agentic loop: Claude responds. If
stop_reason === "tool_use": hooks fire (PreToolUse → permissions → execute → PostToolUse), results append to history, loop continues. Ifstop_reason === "end_turn": response displayed, loop exits. - Context management: If history exceeds ~83.5% of 200K, auto-compaction summarizes older messages. CLAUDE.md survives because it's re-read from disk every turn.
The architecture is less magic than it appears. A well-engineered loop with smart context management, a layered memory system with clear durability trade-offs, and an open protocol for extensibility — now wrapped in a plugin system that makes it all shareable. The model itself — Opus, Sonnet, or Haiku — is deliberately interchangeable. Claude Code's value is in the scaffolding, not the model. Which, as I've written before, is the real lesson of the AI coding era: the model is the commodity; the agent is the product.
Sources: How Claude Code Works · Memory Docs · Skills Docs · Plugins Docs · Plugins Reference · MCP Docs · Hooks Reference · Permissions · System Prompts (Piebald-AI) · MCP Spec · Agent Loop Docs · Reverse Engineering (Shatrov) · Poking Around Claude Code (Chung) · Inside Skills (Shilkov) · No Indexing (Vadim) · Official Plugin Directory