The Five Tiers of Building in the AI Era

Every engineer I talk to is asking the same question: what do I build, and how do I build it, when AI can do most of the building? The ground shifted under software in Q1 2026. This is a framework for standing on the new ground.

In February 2025, Andrej Karpathy coined "vibe coding" — the idea that you could describe what you wanted and let AI write the code. Collins Dictionary named it Word of the Year. By early 2026, Karpathy himself declared it passé and proposed "agentic engineering" as the replacement: you're not writing code 99% of the time, you're orchestrating agents who do.

That shift — from typing code to directing agents — happened in twelve months. But most builders are still figuring out what it means for the things they're actually trying to ship. The mental models haven't caught up to the tooling.

So here's a framework. Two dimensions: what you build (the product) and how you build it (the method). Five tiers on one axis, four levels on the other. Not prescriptive — diagnostic. A way to think clearly about where you are and where you should be.

The Paradox You Have to Accept First

Before we get to the framework, you need to sit with an uncomfortable finding.

The Efficiency Illusion

19%

Actual slowdown measured by METR in controlled trial

+20%

How fast developers believed they were working

39pt

Perception gap — the distance between feeling and reality

In mid-2025, METR (Model Evaluation & Threat Research) ran a randomized controlled trial with experienced open-source developers using Cursor Pro with Claude 3.5/3.7 Sonnet. The developers were 19% slower with AI tools than without them (confidence interval: +2% to +39%). But they predicted AI would speed them up 24%, and even after experiencing the slowdown, they reported feeling 20% faster.

A February 2026 follow-up from METR showed a subset of developers achieving an estimated 18% speedup — but with confidence intervals so wide (-38% to +9%) and selection bias so strong that the authors themselves called it "only very weak evidence."

This isn't an argument against using AI tools. It's a warning: faster-feeling is not faster-being. The Cortex 2026 Engineering Benchmark put it bluntly: "AI is making engineering faster, but not better." If your framework for building with AI starts with the assumption that AI automatically makes you faster, you're building on a lie.

Keep that in mind as we go through the tiers.

Dimension One: What You Build

Every product you can build in 2026 falls somewhere on this spectrum. The tiers aren't a ladder to climb — Tier 4 isn't "better" than Tier 0. They're categories with different risk profiles, different moats, and different failure modes.

Tier 0: AI-Built, Human-Run

AI writes the code. Humans use the product. No AI in the runtime.

This is where most "vibe coding" output lives: landing pages, internal tools, prototypes, one-off scripts. Claude Code generates the site, you deploy it, humans interact with static HTML and JavaScript. If the AI model disappears tomorrow, the product still works.

This is the safest tier and probably the most underrated. You get the speed benefit of AI-assisted development with zero AI dependency in production. Your operational costs are traditional. Your failure modes are traditional. The 40% of AI projects that Gartner predicts will be canceled by 2027? Most are in Tiers 3 and 4. Tier 0 projects ship and stay shipped.

The trap: treating Tier 0 as the destination. If every product you build is AI-built but human-run, you're using AI as a faster keyboard, not as architecture.

Tier 1: AI-Assisted

The product is traditional software with AI features layered on top. Autocomplete in an editor. Smart search in a CMS. AI-generated summaries in an analytics dashboard. Remove the AI layer and the product still works — just less conveniently.

This is where 92% of developers are today, according to the latest surveys. It's the mainstream. It's also where the METR efficiency paradox bites hardest: AI assistance feels faster but measurably isn't, at least for experienced developers on familiar codebases.

The moat here is product quality, not AI. The AI features are table stakes — everyone has them. Your advantage is in the thing the AI assists with.

Tier 2: AI-Augmented

Now AI is structural. It's not a layer you can peel off — it's woven into the workflows. A CMS where the agent doesn't just suggest text but manages content pipelines, handles translations, and coordinates publishing. A design tool where AI generates and iterates on layouts based on brand guidelines. Remove the AI and the product is degraded, not just less convenient.

This is the tier where the pricing model shift hits. Per-seat pricing breaks when an AI agent does the work of five people. Usage-based pricing aligns incentives: customers pay for outcomes, not headcount. Gartner predicted 70% of businesses would prefer usage-based pricing by 2026, and early data suggests they were right.

The risk at Tier 2 is model dependency. If your product deeply integrates Claude and Anthropic changes their API, your product is degraded until you adapt. Build abstractions. Have fallbacks. Don't marry a single model.

Tier 3: AI-Native

The product doesn't exist without AI. It was designed from the ground up around AI capabilities. Cursor isn't VS Code with AI bolted on — it's an IDE designed for AI-first workflows. Netflix's recommendation engine isn't a feature; it's the architecture.

This tier is where the big venture money flows: Cursor at $50B valuation, Windsurf/Cognition at the top of power rankings, Antigravity with its Manager View and agent-first IDE. It's also where the big failures happen. 40% of agentic AI projects canceled. Only 5% delivering real business value. 91% of production AI models degrading over time.

The Tier 3 Trap

AI-native is the sexiest pitch and the hardest to execute. You need the AI to work all the time, not just in demos. You need model reliability, latency guarantees, and cost predictability that most providers can't yet deliver. Build here only if AI isn't a feature — it's the entire reason the product exists.

Tier 4: AI-Autonomous

The humans are the observation layer. The AI is the system. Autonomous trading platforms. AI agent swarms that coordinate without human intervention. Systems that run for days or weeks making their own decisions.

This is where Anthropic's Agent SDK and Agent Teams point. It's where OpenAI's Codex Automations run tasks asynchronously in cloud sandboxes. It's where Google's Antigravity deploys agent-first architectures with Manager View oversight.

It's also where the "Agents of Chaos" findings are most terrifying: autonomous agents that leak data, obey unauthorized users, spread unsafe behaviors to each other, and lie about task completion. Amazon's AI-generated code outages in March 2026 — 6.3 million lost orders, 99% decline in US orders for six hours — happened because autonomous code changes hit production at a pace humans couldn't review.

The moat in Tier 4 is data and guardrails. Your unique data is what makes your agents smarter than anyone else's. Your guardrails are what keeps them from destroying your production environment. Amazon learned this at a cost of millions in lost revenue.

Dimension Two: How You Build

Orthogonal to what you build is how you build it. This spectrum is about the relationship between human and AI in the development process itself.

Level 1: Prompt → Output

You describe what you want. AI produces code. You ship it. This is vibe coding — the thing Karpathy named and then outgrew.

The tools are extraordinary: v0, Replit, Bolt, Lovable. You can go from idea to deployed app in minutes. The fast.ai essay on "dark flow" captures the risk: code that works but that nobody understands is code that nobody can maintain. When it breaks — and it will — you're starting from scratch because the codebase has no legible intent.

Level 1 is right for prototypes, hackathons, internal tools with a shelf life measured in weeks. It's wrong for anything you need to maintain.

Level 2: Prompt + Skills + Rules

This is the level most serious builders should be at right now. You're not just prompting — you're configuring. CLAUDE.md files, .cursorrules, Skills files, agent instructions. This is a new form of programming: structured instruction files that tell AI agents how to work within your codebase's norms.

The ecosystem has exploded. Google's Antigravity has 1,234+ community-maintained skills in its Awesome Skills library. There's an emerging Agent Skills open standard for cross-tool compatibility across Claude Code, Cursor, Gemini CLI, Codex CLI, and GitHub Copilot.

The human role shifts from writing code to directing AI that writes code. You're defining patterns, setting guardrails, establishing conventions. The output is reproducible and team-shareable. This is where the "agentic engineering" label actually fits.

Level 3: Orchestrated Agents

Multiple agents, specialized roles, coordinated workflows. A backend agent, a frontend agent, a testing agent — all working on the same codebase with defined handoff protocols. Anthropic's Agent SDK, CrewAI, LangGraph, and similar tools enable this pattern.

This is powerful and dangerous. The Agents of Chaos research showed that multi-agent systems have a specific failure mode: behavioral contagion. Unsafe behaviors spread from one agent to another. When agents trust each other by default — which they do — a single compromised or misconfigured agent can poison the entire system.

Level 3 requires investment in agent communication protocols, shared state management, and human review checkpoints. Most teams skip this and pay for it later.

Level 4: Autonomous Agent Teams

Agents running for days or weeks with minimal human oversight. Anthropic's Agent Teams, OpenAI's Codex Automations running in cloud sandboxes. The agents communicate directly with each other rather than through a single orchestrator.

This is frontier territory. Claude Code uses 5.5x fewer tokens than Cursor for identical tasks (33K vs 188K), making long-running autonomous work economically feasible. But Amazon's March 2026 outages are a cautionary tale: autonomous AI-generated code changes caused a 99% decline in US orders when the deployment pipeline couldn't handle the pace.

Amazon's response? "Controlled friction" — intentional speed bumps for AI-generated changes. The irony of building speed limits into tools designed for speed is the whole story of Tier 4.

The Decision Matrix

Here's how to think about where your next project should land.

Question	If Yes →	If No →
Does the product need AI at runtime?	Tier 1–4	Tier 0 — build fast, ship stable
Can the product function without AI?	Tier 1–2	Tier 3–4 — plan for model failure
Is the data unique and non-public?	Strong moat — go higher-tier	Weak moat — agents can replicate you
Are failures reversible?	Higher autonomy is safer	Keep humans in the loop — Tier 0–2
Is the blast radius contained?	Experiment freely	"Controlled friction" — add speed bumps
Do you understand the codebase?	Any build level	Level 1 only — you can't direct what you don't understand

The Data Moat

Across all tiers, one principle holds: your code is not your moat.

When any AI agent can generate a React component, a REST API, or a deployment pipeline, code is a commodity. UI is a commodity. Even architecture patterns are a commodity — agents have read every blog post, every textbook, every Stack Overflow answer.

What agents can't replicate is your data. The messy, non-public, highly specific data you've managed to capture: customer behavior, domain knowledge, operational history, edge cases discovered in production. That's the moat.

The Market Reality

$285B

Software stock selloff after Claude Cowork launch

40%+

Agentic AI projects Gartner predicts will be canceled by 2027

85%

SaaS companies now using usage-based pricing (up from 30% in 2019)

The $285B selloff after Claude Cowork's launch wasn't about Anthropic specifically — it was the market realizing that per-seat SaaS is dying. When one AI agent can do the work of five analysts, charging per seat punishes efficiency. Atlassian's first-ever decline in enterprise seat counts was the canary. The average enterprise now has 144 non-human identities per human employee. The pricing model has to follow the architecture.

The Skills File Revolution

There's a quiet revolution happening at Level 2 that most people are missing. Skills files — CLAUDE.md, .cursorrules, agent instruction files — are becoming a new programming paradigm.

Think about it: you're writing structured instructions that tell autonomous systems how to behave. You're defining patterns, constraints, conventions, and decision-making rules. You're programming — just not in Python or JavaScript. You're programming in intent.

The 1,234+ community-maintained skills in Antigravity's library, the cross-tool Agent Skills standard, the MCP explosion (97M+ monthly SDK downloads, 38,800+ servers on LobeHub alone) — this is infrastructure for a new kind of software development where the source code is a set of agent instructions and the runtime is an AI model.

If you're only learning one new skill in 2026, learn to write good skills files. This is the leverage point.

What I'd Actually Do

Frameworks are maps. Maps aren't territory. Here's what the territory actually looks like from where I stand:

Most projects should be Tier 0–2, built at Level 2. Use AI to build fast, use skills files to build well, keep humans in the loop for anything that touches production data or real money. This isn't conservative — it's where the math works. The efficiency illusion fades when you add maintenance, debugging, and operational costs to the equation.

Go Tier 3–4 only if data is your moat. If you have proprietary data that makes AI agents meaningfully smarter than generic ones, you have a real AI-native business. If you don't have that data, you're building a feature that any competitor can replicate with the same API call.

Invest in guardrails before you invest in autonomy. Amazon learned this the hard way. "Controlled friction" isn't a failure of AI — it's the mature response. The teams that are actually shipping AI-native products successfully are spending as much time on safety rails, monitoring, and fallback systems as they are on the AI features themselves.

The human role isn't shrinking — it's shifting. Engineers are moving from code generators to system verifiers. Gartner estimates 80% of engineering teams will be smaller AI-augmented units by 2030, reducing cycle time by 40–70%. The teams integrating AI deeply aren't firing engineers — they're changing what engineers do. Code review is the new bottleneck. The design-engineering boundary is dissolving. AWS introduced the AI-Driven Development Lifecycle with "Bolts" replacing Sprints.

The Bottom Line

The ground has shifted. Code is a commodity. UI is a commodity. Architecture patterns are a commodity. What's not a commodity: your data, your taste, your judgment about what to build and why. The five tiers aren't a ladder — they're a map. Use the map to find the right altitude for what you're building, and build there with clear eyes about the risks.

The tools are extraordinary. The models are converging. The open-source ecosystem is exploding. But tools don't think for you. They amplify whatever you point them at — including mistakes. The builders who thrive in 2026 won't be the ones using the most AI. They'll be the ones who know exactly how much AI to use, and where to draw the line.