The Constraint Fallacy

There is a popular approach to AI governance that we need to address directly: constraining the tool.

Guardrails. Output filters. Prompt injection defenses. Rate limiting. Sandboxing. These are all forms of constraining the instrument — limiting what the tool can do.

This is analogous to governing a human by constraining their hands. You can put gloves on someone to prevent them from picking locks. You can put them in a straitjacket to prevent them from moving. But you haven't governed the person. You've restrained an appendage.

The agent — the entity making decisions — is upstream of the tool. Constraining the tool doesn't make the agent accountable. It makes the tool less useful while leaving the decision-making process ungoverned.

In practice, this is why AI safety efforts focused purely on model behavior keep getting circumvented. The model is the hand, not the brain. Jailbreaks work because they target the decision-making layer that the constraint layer can't see. Output filters work until someone figures out how to ask the question differently.

Real governance addresses the decision-making entity, not the instrument it uses. It provides awareness, incentive, and consequence at the level where decisions are actually made.

The principle: Constraining the tool is not governing the agent. Governance must operate at the level where decisions happen, not downstream at the level of observable outputs.

For a concrete example of the most sophisticated constraint-based governance system in production — and what happens when it's exposed — see Block 38: The Claude Code Source Leak.