The Layered Cake of Claude Code

A practical, deep-dive map of Claude Code — what every folder, file, plugin, skill, hook, agent, and MCP actually does, how to compose them, how to swap the brain for OpenRouter or Ollama, and how it stacks up against Mastra and Pi.

Note: in this post, "Claude Code" refers to Anthropic's terminal coding harness. I'll use "harness" a lot — it's the cockpit that wraps the LLM and gives it tools, memory, and a runtime.

Why think in layers?

Picture Claude Code as a layered cake — or, if you prefer, an apartment building.

The LLM is the foundation slab — raw intelligence with no plumbing.
The harness is the building — walls, wiring, elevators, fire alarms.
Each folder layer is a floor with its own residents (config, memory, plugins).
Skills, subagents, MCPs, and hooks are the appliances and staff that actually get work done on each floor.

When something behaves weirdly — Claude forgets context, burns tokens, picks the wrong tool — it's almost always because you don't yet have a clear mental model of which floor the bug is on. The point of this post is to give you that floor plan.

Diagram 1 — The layered stack (top = most specific, bottom = rawest):

The full stack at a glance

From bottom (rawest) to top (most project-specific):

Layer 0 — The LLM. Claude Sonnet, Opus, Haiku, or any model you route to.
Layer 1 — The Harness. Claude Code itself — the loop that takes your prompt, decides on tool calls, edits files, runs bash.
Layer 2 — ~/.claude/ (home). Your global settings, plugins, memory, and agents — applies to every project on this machine.
Layer 3 — .claude/ (project). Repo-level config that travels with the codebase and is shared with your team.
Layer 4 — Memory files. CLAUDE.md (committed) and CLAUDE.local.md (gitignored, personal).
Layer 5 — Plugins. Installable bundles of skills, agents, hooks, commands, and MCP servers.
Layer 6 — Skills. SKILL.md files that teach Claude how to do a specific task well.
Layer 7 — Subagents. Specialized Claudes you dispatch with their own context window.
Layer 8 — Hooks. Middleware that fires on lifecycle events (PreToolUse, PostToolUse, SessionStart, …).
Layer 9 — MCPs. External tool servers (Slack, Linear, Notion, your DB, your CI, …).

Mental model: the harness loads layers in order, and later layers override earlier ones. Project beats home. Local beats committed. Skills beat raw prompts. That's the whole game.

Layer 0 — The LLM (the brain)

This is the model weights themselves: Claude Opus 4.6, Sonnet 4.6, Haiku 4.5, etc.
The LLM has no idea your project exists until the harness ships it the right context.
It is swappable. The harness doesn't care which brain it's talking to as long as the API is Anthropic-compatible.

Alternatives:

Hosted frontier: GPT-5 / o-series (OpenAI), Gemini 2.5 (Google), Grok (xAI), Mistral Large.
Open-weights, local: Llama 4 / 3.x (Meta), Qwen 3 (Alibaba), DeepSeek-R1 / V3, Mixtral.
Aggregators: OpenRouter (one key, 100+ models), Together AI, Fireworks, Groq.

Real-world analogy: the LLM is a brilliant freelance contractor. Smart, but useless without the keys to the building, the tools, and a brief.

Layer 1 — The Harness (Claude Code itself)

A harness is the runtime around the LLM:

The agentic loop — read prompt → think → call tool → observe → repeat until done.
The tool layer — Read, Write, Edit, Bash, Grep, Glob, WebFetch.
The context manager — what's in the prompt, what got truncated, what's cached.
The safety/permission layer — which folders Claude can touch, which commands need approval.

Alternatives:

Terminal harnesses: Mastra Code (TypeScript-extensible, "never compacts"), Pi (pi.dev, minimalist), Aider (mature, token-efficient), Codex CLI (OpenAI), Gemini CLI (Google), Cline / Roo Code, OpenCode.
IDE harnesses: Cursor, Continue.dev, Windsurf, Zed AI, JetBrains AI.
Background / multi-agent: Devin, Factory, Sweep, OMC (oh-my-claudecode multi-CLI orchestration).

Analogy: if the LLM is the contractor, the harness is the work order, the toolbox, and the foreman combined. Same contractor, different harness, completely different output.

Diagram 2 — Harness anatomy:

Layer 2 — `~/.claude/` (your home base)

Lives in your home directory. Everything here applies to every project on this machine.

Typical contents:

~/.claude/settings.json — global preferences, default model, hook config.
~/.claude/CLAUDE.md — global memory ("I prefer pnpm over npm. I work in PT.").
~/.claude/agents/ — your personal subagents.
~/.claude/commands/ — slash commands available everywhere.
~/.claude/plugins/ — installed plugins.
~/.claude/hooks/ — your global hooks.

Alternatives (same idea, different tool):

~/.codex/ and ~/.codex/AGENTS.md (Codex CLI).
~/.gemini/ and GEMINI.md (Gemini CLI).
~/.aider.conf.yml + ~/.aider/ (Aider).
~/.cursor/rules/ + ~/.cursor/mcp.json (Cursor).
~/.continue/ (Continue.dev).
~/.config/mastra/ (Mastra Code).

Analogy: this is your personal toolbox in the truck. You bring it to every job site.

Layer 3 — Project `.claude/` folder

A .claude/ directory inside your repo. Same shape as ~/.claude/, but scoped to this project and committed to git so the team gets the same setup.

.claude/agents/ — project-specific subagents (e.g. db-migration-reviewer).
.claude/commands/ — project slash commands (/deploy-staging).
.claude/skills/ — project skills (e.g. "how we write Postgres migrations").
.claude/hooks/ — project hooks (e.g. block commits without a Linear ticket).
.claude/settings.json — project model overrides, allowed tools.

Alternatives (same idea, different tool):

.cursor/rules/ and .cursor/mcp.json (Cursor).
.codex/ (Codex CLI).
.gemini/ (Gemini CLI).
.aider/ + .aider.conf.yml + CONVENTIONS.md (Aider).
.continue/ (Continue.dev).
.mastra/ (Mastra Code, project-scoped threads + config).
.vscode/settings.json for AI extensions.

Analogy: the shared toolbox in the workshop that everyone on the team uses for this job.

Layer 4 — The `CLAUDE.md` family (memory)

This is the layer most people get wrong. There are three flavors and they cascade:

~/.claude/CLAUDE.md — your global memory. Survives across all projects.
<repo>/CLAUDE.md — project memory. Committed to git. Team-shared.
<repo>/CLAUDE.local.md — your personal project notes. Gitignored. Don't commit.

Order of precedence at session start (most general → most specific):

~/.claude/CLAUDE.md → <repo>/CLAUDE.md → <repo>/CLAUDE.local.md

What actually goes in each:

Global: your preferences, package manager defaults, communication style.
Project committed: architecture overview, repo conventions, "we use Drizzle, not Prisma," critical gotchas.
Project local: your local DB creds path, in-progress refactor notes, "don't touch X until Friday."

Alternatives (the "memory file" name in other harnesses):

AGENTS.md — increasingly the cross-tool standard. Used by OpenAI Codex, Cursor, Aider, and others; many tools now read both AGENTS.md and their own file.
GEMINI.md — Gemini CLI's equivalent.
.cursorrules (legacy) → .cursor/rules/*.md (current) — Cursor's project-rules format.
CONVENTIONS.md — Aider's preferred file (--read CONVENTIONS.md).
System prompt files — Continue.dev and most TypeScript frameworks (Mastra) put this in code.

Pro tip: many teams now keep one AGENTS.md and symlink CLAUDE.md / GEMINI.md to it so every harness picks it up.

Analogy: think of CLAUDE.md as the onboarding doc you'd hand a new contractor. Global = "how I work." Project = "how the team works." Local = "what's weird about my setup right now."

Diagram 3 — The CLAUDE.md cascade:

Layer 5 — Plugins

A plugin is a packaged bundle that can include any combination of: skills, subagents, slash commands, hooks, and MCP servers. Install one plugin and you get a coordinated set of capabilities.

Examples you can install today:

Superpowers — discipline plugin: TDD, systematic debugging, brainstorming, plan-writing, parallel agents. Very opinionated — it forces you into rigorous workflows.
oh-my-claudecode (OMC) — multi-agent orchestration plugin. ~19 specialized agents, ~36 skills, runs Claude / Codex / Gemini in parallel tmux panes for cross-validation. Marketed as zero-config + 30–50% token savings.
gstack — Garry Tan's (YC President) personal Claude Code stack, 23 opinionated tools modeling CEO / Designer / Eng Manager / Release Manager / Doc Engineer / QA roles. Workflow: Think → Plan → Build → Review → Test → Ship → Reflect.
caveman — token-reduction plugin (more on this below).

Alternatives (plugin/extension ecosystems in other harnesses):

Cursor — Cursor Rules library + Cursor extensions marketplace.
Continue.dev — config.yaml blocks + community config presets.
Aider — no plugin system by design; you compose via CONVENTIONS.md + git hooks.
Codex CLI — slash-command bundles, AGENTS.md presets.
Pi — first-class pi packages (third-party harness extensions).
Mastra Code — extend programmatically with custom modes, tools, subagents in TypeScript.

Analogy: plugins are like buying a kitchen appliance bundle instead of one mixer at a time — they come pre-wired together.

Layer 6 — Skills (the secret sauce)

A skill is a folder with a SKILL.md file (and optional helper scripts/templates). It tells Claude how to do a specific task — the recipe, not the ingredients.

Skills load on demand when their description matches your request — that's "progressive disclosure" and it's why skills don't blow up your context window.
Skills can be standalone in .claude/skills/ or bundled inside plugins.
A great skill is a checklist + a few examples + a link to deeper docs the LLM can read if needed.

Alternatives (the "on-demand instructions" pattern):

Cursor — Project Rules / User Rules with glob-based scoping.
Pi — Skills are first-class, designed not to bust the prompt cache.
Mastra Code — Skills + custom commands.
gstack — slash commands as opinionated mini-skills.
Aider — no equivalent; you'd inline guidance in CONVENTIONS.md.
Continue.dev — prompt template files + .prompt blocks.

Analogy: skills are standard operating procedures posted on the wall. The contractor (LLM) reads the relevant SOP only when they're about to do that task — they don't memorize all of them up front.

Layer 7 — Subagents

A subagent is a separate Claude instance you dispatch with a focused task and a fresh context window. It returns a summary; the bulk of its working memory never pollutes yours.

Why this matters:

Context isolation — a 50k-token research dive doesn't bloat your main session.
Specialization — code-reviewer, Explore, Plan, custom domain reviewers.
Parallelism — fire several at once for independent tasks.

Alternatives (multi-agent / delegation patterns):

Mastra Code — custom subagents in TypeScript via the Agent primitive.
Mastra (framework) — full agent + workflow graph (DAGs, branches, parallelism).
OMC (oh-my-claudecode) — runs Claude + Codex + Gemini as parallel CLI panes for cross-validation.
Cline / Roo Code — built-in "Plan/Act" modes acting as lightweight subagents.
CrewAI / AutoGen / LangGraph — framework-land alternatives if you're building outside a CLI.
Pi — none by default (philosophical); add via packages.
Aider — single-agent; uses architect/editor roles on one model instead.

Analogy: delegating to a specialist consultant. They go off, do the deep work, and email you a one-pager.

Diagram 4 — Subagent fan-out (main session delegates, gets summaries back):

Layer 8 — Hooks (the middleware)

Hooks are scripts that run automatically on lifecycle events. They're the interceptors of Claude Code.

Common hook points:

SessionStart — runs when you launch Claude Code (e.g. inject today's date, current branch, recent commits).
PreToolUse — fires before any tool call. Can block, modify args, or warn.
PostToolUse — fires after. Great for auto-formatting, logging, or running tests after Edit.
UserPromptSubmit — fires when you send a message. Inject extra context here.
Stop / SubagentStop — cleanup, summarization, notifications.

What they're great for:

Auto-running prettier / ruff after every Edit.
Blocking writes outside the project root.
Posting a Slack message when a long task finishes.
Injecting "what changed since my last commit" into every session.

Alternatives (lifecycle / middleware in other tools):

Mastra Code — has its own hooks layer (pre/post tool, session lifecycle).
Aider — built-in lint/test commands + git hooks at the OS layer.
Cursor — auto-format-on-save and limited "background tasks."
Pi — extend via skills + scripts (no formal hooks API).
OS / git hooks — pre-commit, pre-push, Husky — work with any harness.
Continue.dev — slash commands + indexer hooks.

Analogy: hooks are the building's automation system — motion-activated lights, fire alarms, key-card locks. Claude doesn't think about them; they just fire on the right event.

Layer 9 — MCPs (external tools)

Model Context Protocol servers expose external systems to Claude as callable tools — Slack, Linear, Notion, GitHub, your database, your internal API, etc.

An MCP server defines tools + their JSON schemas.
The harness loads them, hands the schemas to the LLM, and proxies calls.
Plugins frequently bundle MCPs (e.g. a "PM plugin" might bundle Linear + Notion + Figma MCPs).

Alternatives (how each ecosystem exposes external tools):

MCP (cross-tool standard) — supported by Claude Code, Cursor, Mastra Code, Continue, Codex, Gemini CLI, Cline, and most modern harnesses. The closest thing to a "USB-C for AI tools."
OpenAI function calling / Tools API — the older, ecosystem-specific equivalent.
LangChain / LlamaIndex tools — framework-side abstractions.
Mastra tools — TypeScript-native tool definitions in the Mastra framework.
Aider — minimal; relies on shell + git instead of external connectors.
Custom REST/GraphQL — wrap anything as an MCP server with one of the SDKs.

Analogy: MCPs are the utility hookups — water, gas, electricity, internet. The building (harness) is useless without them once you want to do real work.

How everything stacks together

The thing that confuses people: all these layers are active at once. Here's the mental order on every turn:

The LLM receives a prompt.
The harness has prepended: system prompt + global CLAUDE.md + project CLAUDE.md + local CLAUDE.md + active skills' descriptions + tool schemas (incl. MCPs).
Hooks may have already run (SessionStart, UserPromptSubmit) and injected extra context.
The LLM picks a tool. PreToolUse hooks fire — they can block or modify.
Tool runs. PostToolUse hooks fire.
If the task is gnarly, the LLM dispatches a subagent, which runs the same loop in isolation.
Repeat until done. Stop hooks fire.

Plugins like Superpowers + your own agents play nicely together because they live in different layers: Superpowers ships skills + commands; your custom agents live in .claude/agents/; your hooks live in .claude/hooks/. They compose because the harness merges them at load time.

Diagram 5 — One full turn, including hooks firing:

The framework ecosystem (the "configs on top of configs")

This is where it gets interesting. People are building opinionated stacks on top of Claude Code, the way oh-my-zsh sits on top of zsh.

Superpowers

A discipline-first plugin. Strong opinions on TDD, systematic debugging, plan-before-code, evidence-before-claims.
Forces you to use the skill even when you think you don't need it.
Best for: solo devs and small teams who want guardrails against "vibe coding."

oh-my-claudecode (OMC)

Multi-agent orchestration. Ships ~19 agents and ~36 skills.
Runs Claude + Codex + Gemini CLIs in parallel tmux panes — great for cross-checking architecture, security review, deep code analysis.
Marketed at 30–50% token savings via specialization.

gstack (Garry Tan / YC)

23 tools modeling startup roles. The skills /office-hours, /review, /ship are the headline features.
Workflow is the product: Think → Plan → Build → Review → Test → Ship → Reflect.
Tan claims ~10K logical LOC/week throughput while running YC full-time. Take the number with a grain of salt; the workflow itself is the real artifact.

caveman

Token-reduction skill. Forces Claude to drop articles, filler words, hedging — output reads like cave drawings.
Reported ~65% drop in average response tokens (294 vs 1,214 across 11 dev tasks); spread of 22–87% depending on prompt.
Bonus: a March 2026 paper ("Brevity Constraints Reverse Performance Hierarchies in Language Models") found brevity-constrained models gained up to 26pp in accuracy on some benchmarks. Less can literally be more.

All four can be combined. Superpowers for discipline + gstack for workflow + caveman for compression + your own .claude/ for project-specific glue.

Swapping the brain: OpenRouter and Ollama

Claude Code ships pointed at Anthropic's API, but the harness only cares about an Anthropic-compatible endpoint. Two main routes:

Option A — Claude Code Router (CCR)

A community proxy that translates Claude Code's API calls to OpenRouter, OpenAI, Gemini, DeepSeek, Ollama, etc.
You set ANTHROPIC_BASE_URL (and an API key) to point at the router.
Per-task routing: cheap model for trivial edits, frontier model for hard reasoning.

Option B — LiteLLM proxy

Same idea, more general-purpose. Spins up an OpenAI-/Anthropic-compatible endpoint that fans out to ~100+ providers.

Option C — Ollama (fully local)

Run a model locally — Llama 3.x, Qwen, DeepSeek-R1 distills, etc.
Stick LiteLLM (or CCR) in front of Ollama to expose an Anthropic-compatible API.
Trade-off: zero API cost, full privacy; but tool-use quality varies wildly by model. Frontier-class agentic behavior is still a stretch on consumer hardware.

Quick recipe (conceptual):

OLLAMA_HOST=... runs the model.
LiteLLM exposes http://localhost:4000 as Anthropic-compatible.
export ANTHROPIC_BASE_URL=http://localhost:4000 and export ANTHROPIC_AUTH_TOKEN=anything.
Launch Claude Code. Same harness, different brain.

Analogy: the harness is the car. Claude is the default engine. OpenRouter and Ollama let you swap the engine without rebuilding the chassis. The steering wheel (your skills, hooks, CLAUDE.md) doesn't care which engine you bolted in.

⚠️ Caveat: smaller open models often struggle with multi-tool workflows, file editing precision, and long contexts. For local dev you're often best with a hybrid — Ollama for cheap autocomplete-like loops, Claude/Anthropic for the hard stuff.

Diagram 6 — Same harness, swappable brain:

Comparing harnesses: Claude Code vs Mastra Code vs Pi

You asked which harness uses fewer tokens. Short answer: it's not the harness, it's the discipline layered on top. But the three harnesses have real personality differences.

⚠️ Quick disambiguation. Mastra is the TypeScript framework for building AI apps and agents (think SDK + primitives: Harness, Agent, Memory). Mastra Code is a separate terminal coding agent built on top of Mastra's primitives — it's the direct apples-to-apples comparison to Claude Code. Below I'm comparing Mastra Code, the harness, not the framework.

Claude Code

Heaviest by default — rich tool layer, MCPs, plugins, subagents, hooks all in one box.
Best ergonomics for humans; biggest ecosystem (Superpowers, OMC, gstack, caveman).
Token cost can creep up fast unless you use skills + caveman + subagents to isolate context.
Memory model: prompt-based (CLAUDE.md cascade) + auto-compaction when the context window fills.

Mastra Code (code.mastra.ai)

Terminal-based coding agent built on Mastra's Harness / Agent / Memory primitives.
Headline feature: "never compacts." Uses observational memory — it watches, reflects, and compresses context as it goes, instead of doing a hard truncation when the window fills. The pitch: long-running sessions stay precise.
Three explicit modes — Build / Plan / Fast — so you pick a different cost/quality profile per task.
Supports MCP servers, hooks, custom commands, skills, project-scoped threads.
Programmatically extensible: custom modes, tools, subagents, storage all in TypeScript.
Connects to "thousands" of models out of the box (you bring the API key).
Requires Node 22.13+.
Strength: longest-running sessions stay coherent without manual /compact. Weakness: smaller ecosystem of community plugins than Claude Code today.

Pi (pi.dev, by Mario Zechner)

Minimalist terminal coding harness. Explicit non-goal: features like sub-agents and plan mode are not built in — you add them only if you need them.
Skills are first-class, loaded on-demand without breaking the prompt cache.
Self-modifying philosophy: Pi can rewrite Pi to fit your workflow.
Often the lowest token-per-task in head-to-head comparisons because there's less ambient harness overhead — but you pay for it in setup time.

Rough trade-off table

Most plug-and-play → Claude Code.
Best at long, never-compact sessions → Mastra Code (observational memory).
Lowest baseline token spend → Pi (especially with caveman bolted on).
Best for multi-agent orchestration → Claude Code with OMC, or Mastra Code with custom subagents.
Easiest to extend in TypeScript → Mastra Code.

The honest take: no harness is intrinsically cheaper. What matters is (a) how much system prompt you stuff, (b) how aggressively you use subagents to isolate context, (c) whether your output style is verbose by default, and (d) whether your harness compacts (Claude Code) or compresses-as-it-goes (Mastra Code) — the latter avoids the cliff where Claude Code "forgets" mid-session.

Token economics (and where caveman fits)

Where tokens actually go in a Claude Code session:

System prompt + harness instructions — fixed overhead per turn.
CLAUDE.md cascade — grows with every doc you add.
Active skills' descriptions — minor; only triggered skills load full content.
Tool schemas (esp. MCPs) — often the biggest invisible cost. Loading 200 MCP tool schemas can dwarf your prompt.
Conversation history — grows linearly. Caching helps.
Output tokens — what caveman attacks.

Practical levers:

Prune MCPs aggressively. If you don't use a connector this session, don't load it.
Move long instructions into skills, not CLAUDE.md, so they only load on demand.
Use subagents for research — keep their summary, not their working memory.
Run caveman when your task is well-defined and you don't need verbose explanations.
Cache the system prompt. Anthropic's prompt caching cuts repeat-prompt cost dramatically.

Putting it all together

You don't pick one layer — you compose them. A reasonable "power user" setup:

Brain: Claude (default) for hard tasks, Ollama via LiteLLM for cheap loops.
Harness: Claude Code.
Home: ~/.claude/CLAUDE.md with your prefs, plus Superpowers + caveman installed globally.
Project: .claude/ with team CLAUDE.md, project skills, a code-reviewer subagent, a pre-commit hook.
Personal: CLAUDE.local.md for your in-progress quirks.
Workflow: gstack-style Think → Plan → Build → Review → Test → Ship.

Analogy to close on: Claude Code isn't a tool — it's a building you can renovate. The LLM is rented power. The harness is the structure. Your .claude/ folders are the floors you actually live on. Plugins are pre-fab kitchens. Skills are recipes pinned to the fridge. Subagents are interns. Hooks are the smart-home automation. MCPs are the utilities.

You don't have to build everything yourself — but you do have to know which floor the light switch is on.