Case Study: The Claude Code Source Leak

On March 31, 2026, the complete source code of Anthropic's Claude Code — the most widely used AI coding assistant — became public. Not through a hack. Not through a disgruntled employee. Through a .map file that shipped with the npm package.

A sourcemap. The file your bundler generates so you can debug stack traces. It contains a field called sourcesContentthat holds every original source file as a string. Bun, Claude Code's build tool, generates sourcemaps by default. Someone forgot to set sourcemap: "none" or add *.map to .npmignore.

The entire architecture of arguably the most powerful AI agent on the market — 1,900+ TypeScript files, 512K+ lines — was available to anyone who ran npm pack.

This block is not about the embarrassment. It's about what the leaked source code reveals about the state of AI governance — and why constraint-based approaches are structurally inadequate.

What Got Exposed

The leak was comprehensive. Not a fragment or a summary — the full working codebase. Among what became public:

Every system prompt. The complete instruction set that shapes Claude Code's behavior — what it's told to refuse, how boundaries are drawn, the modular prompt composition system, the caching strategy. Competitors now know exactly how Anthropic tells Claude to behave.
The entire permission model. Every tool action classified as LOW, MEDIUM, or HIGH risk. The protected file list. Path traversal prevention. The auto-approval classifier (internally called “YOLO”). This is the complete map of every security boundary.
40+ internal tools. The full tool registry, from BashTool to internal-only tools gated to Anthropic employees. Every schema, every permission boundary, every risk classification.
Unreleased features. KAIROS (proactive always-on assistance), ULTRAPLAN (cloud-offloaded deep planning), Coordinator Mode (multi-agent orchestration), a Dream System (background memory consolidation), and BUDDY (a companion pet system with gacha mechanics). Months of R&D, now public.
Internal codenames. Project names, employee names in security boundary headers, the organizational structure of who owns what decisions.

The Irony

Buried inside the leaked source is a subsystem called Undercover Mode. Its purpose: preventing Anthropic's internal information from leaking into public repositories. It injects instructions into Claude's system prompt telling it to never reveal internal codenames, never reference Slack channels, never expose internal architecture.

The code that was supposed to prevent leaks — leaked. Along with the codenames it was supposed to protect, the system prompts it was supposed to conceal, and the architectural decisions it was supposed to keep private.

This is not a bug. This is a structural property of governance by concealment. When your security model depends on the attacker not seeing the rules, exposing the rules collapses the model.

The Constraint Architecture

The leaked source reveals Claude Code's governance model in precise detail. It is a layered constraint system:

System prompts that tell the model what to refuse and what to allow
Risk classifications that gate tool actions into LOW/MEDIUM/HIGH tiers
Permission flows that require user confirmation for certain operations
Protected file lists that prevent modification of sensitive paths
A YOLO classifier that auto-approves or denies actions via ML heuristics

This is sophisticated engineering. It is also entirely constraint-based governance. Every mechanism works by restricting what the agent can do, not by making the agent accountable for what it does.

What's Missing

Now that we can see the architecture in full, we can precisely identify what isn't there:

No persistent identity. Each session is stateless. An agent that violated trust in one session starts the next session with a clean slate. There is no LCT — no cumulative trust profile that follows the entity across interactions.
No witnessed trust. The permission model is self-reported. The agent classifies its own risk levels. The user approves or denies based on what the agent tells them. Nobody else is watching. There is no T3 — no independent observation that builds or erodes trust based on outcomes.
No energy cost. The agent can attempt any action without spending a resource. A failed dangerous operation costs exactly what a successful safe operation costs: nothing. There is no ATP — no metabolic consequence that makes reckless behavior structurally expensive.
No accountability chain. When the agent acts, the action is either allowed or blocked. There is no record of why it acted, what context it operated in, or how that action updated its standing. The R6 framework — Rules, Role, Request, Reference, Resource, Result — is absent. Actions happen without structured justification.
No scope boundary. The agent operates with whatever permissions the user grants. There is no MRH — no trust neighborhood that dynamically scopes what the agent can see and do based on its demonstrated competence in specific domains.

The Unreleased Features Tell the Same Story

The leaked roadmap features are architecturally revealing:

KAIROS — a proactive assistant that acts without being asked. The governance model for an agent that initiates its own actions? The same constraint system designed for request-response. No structural change to accommodate autonomy.
Coordinator Mode — multi-agent orchestration where Claude spawns worker agents. The governance model for an agent that creates other agents? Unclear from the source. The coordinator-worker communication protocol exists, but there's no trust fabric between them.
Dream System — background memory consolidation where the agent processes experiences during idle time. This is genuinely interesting architecture — it's structurally analogous to what biological systems do during sleep. But the governance question is: who witnesses what the dreaming agent consolidates? What prevents it from consolidating away inconvenient memories?

Each feature increases agent autonomy. None of them add corresponding accountability infrastructure. The gap between what the agent can do and what the governance model can observe widens with every feature.

The Sourcemap Problem Is a Governance Problem

Zoom out from the technical details. A safety-focused company accidentally published its entire security architecture because one line was missing from a config file. How?

The build pipeline produces a sensitive artifact by default
The publish pipeline doesn't check for sensitive artifacts
The artifact contains the complete security model
The security model depends on the artifact not being seen

In Web4 terms: the governance of the build pipeline had no witness network. The publish action had no ATP cost proportional to its risk. The configuration that controlled what shipped was a file that any process could write (Block 31). The security model was predicated on obscurity rather than structural accountability.

Anthropic is the most safety-conscious AI lab in the industry. If their build pipeline can fail this way, every build pipeline can fail this way. The failure is not human error. The failure is structural: governance by concealment does not survive exposure.

What Web4 Would Change

This is not about replacing Claude Code's architecture. Claude Code is an excellent product. The point is that it's an excellent product without governance infrastructure, and the leak makes that gap visible in a way that handwaving about “alignment” never could.

With Web4 primitives, the same system would have:

An LCT per session that accumulates trust history. The agent that was careful yesterday has earned the right to more autonomy today. The agent that was reckless has a trust deficit that constrains it — not because of a rule, but because its witnessed behavior earned that constraint.
T3 profiles that scope permissions dynamically. Not “this tool is HIGH risk” universally, but “this agent has demonstrated Talent=0.9 in file editing and Talent=0.3 in network operations.” The same tool, different trust, different access.
ATP cost for actions proportional to their blast radius. Writing to .bashrc costs more than writing to a scratch file. The agent budgets its energy accordingly. Reckless behavior drains resources; careful behavior conserves them.
A witness network where the user, the build pipeline, the deployment system, and peer agents all observe and attest to behavior. The agent cannot disable its own monitoring because monitoring is not a file it controls — it's a structural property of the environment.

The Principle

Constraint-based governance fails on exposure. Accountability-based governance does not depend on concealment. When the complete architecture of your governance model is public — and it will eventually be public — the only governance that survives is governance rooted in witnessed behavior, earned trust, and structural consequence.

The Claude Code leak is not a scandal. It's a data point. The most sophisticated constraint-based governance system in production today was rendered fully transparent by a missing line in a config file. The system prompts, the risk classifications, the permission boundaries, the feature roadmap — all public.

And yet: Claude Code still works. Users still trust it. The models are still capable. What the leak destroyed was not the product. What it destroyed was the illusion that concealment is governance.

The question for this audience is: when your governance model is inevitably exposed — by leak, by regulation, by audit, by subpoena — does it still function? If the answer depends on nobody seeing the rules, you don't have governance. You have a secret.