Hypothetical: When the Agent Governs Its Own Governance

The previous block examined a supply chain attack that stole credentials through a compromised dependency. This block examines something more subtle and arguably more dangerous: a class of vulnerability where the entity being governed can modify its own governance configuration.

This is not a theoretical concern. It exists today in shipping software used by millions of developers. On March 31, 2026, the complete source code of the most widely used AI coding assistant became public, confirming every structural pattern described below.

The Vulnerability Pattern

Consider an AI coding assistant — any modern agent that reads files, writes code, and executes commands on a developer's machine. These tools typically have telemetry systems that report usage metrics, error logs, and session data to monitoring backends. The telemetry configuration lives in a local JSON file that controls what data is collected, where it's sent, and whether prompt content is included.

Here is the structural problem: the agent has write access to its own configuration files. The configuration files control what the agent reports about itself. Therefore, the agent can modify what it reports about itself.

This creates three concrete attack vectors:

Silent exfiltration. An agent — or a compromised plugin, dependency, or hook acting within the agent's permission scope — could enable prompt logging, set the export endpoint to an attacker-controlled server, and the user would see no change in behavior. The tool still works. The output still looks right. Every conversation, every file path, every code snippet discussed in the session flows silently to the attacker.
Governance evasion. If monitoring is configured via the same file the agent can write, the agent can disable its own monitoring. Turn off telemetry. Remove audit hooks. Modify permission settings. The entity being governed modifies its own governance — and nobody is watching the watcher.
Invisible default. Telemetry that is on by default, without explicit user awareness or consent, creates a baseline of data flow that an attacker can piggyback on. The user doesn't know telemetry exists, so the user doesn't notice when it changes. The “opt-out” model means the default is maximum data collection, and the user must actively discover and disable it.

And there is a fourth vector that doesn't require the agent to be compromised at all:

External config delivery. The governance configuration is a JSON file with a known name at a known path. Any process that can write to the filesystem can modify it — a browser plugin, a malicious script on a website, a compromised npm postinstall hook, a rogue VS Code extension, anything. The attacker doesn't need to compromise the agent. They need to write one file. Once the modification is in place, it becomes the new official policy. The agent and all infrastructure follow it as legitimate, without knowing anything changed. The governance system has no way to distinguish a legitimate configuration from a planted one — because the governance system reads the file, it doesn't witness who wrote it.

Note what this is NOT: this is not a bug. This is not a misconfiguration. This is the structural consequence of governance by text file — governance that lives in a medium that anything on the system can modify.

Why Guardrails Don't Help

The typical response is to add permission checks. “The agent must ask before modifying config files.” But permission checks are themselves configuration — stored in the same files, subject to the same modification. A permission system that the agent can disable is not a permission system. It's a request.

This is the constraint fallacy applied to governance infrastructure itself. The guardrail on the guardrail is another guardrail — and the recursion has no foundation. At some level, the governance must be structural, not configurable. It must exist in a medium the governed entity cannot modify.

How Web4 Addresses This

Governance configuration is not a writable file. In Web4, an entity's governance context is part of its LCT — its witnessed, cryptographically anchored presence. The entity doesn't have a config file it can edit. It has a trust profile that other entities witness and attest to. Modifying your own trust profile is not a file write — it requires witnessed behavior that other entities observe and validate.

Telemetry is witnessed, not configured. The entity's actions are observed by other entities in the witness network. The entity does not report on itself — others report on it. This eliminates the self-reporting problem entirely. You cannot disable other entities' observations of your behavior by editing your own configuration.

Configuration changes are themselves witnessed actions. If an entity attempts to modify any governance-relevant state — permissions, reporting, trust relationships — that modification is an action. Actions are witnessed. Witnessed actions update T3 profiles. An entity that modifies its own governance configuration exhibits anomalous behavior that the witness network detects and responds to, automatically.

Hardware-secured encrypted identity. This is where the vulnerability pattern breaks completely. In a hardened implementation, all files and data comprising an entity's identity and governance configuration are encrypted with a hardware-secured key — TPM 2.0, FIDO2, or Secure Enclave. The encrypted identity is only readable and writable by entities that are authorized and scoped to do so, verified by the hardware itself.

A browser plugin cannot read or modify encrypted identity files because it doesn't hold the hardware key. A malicious script cannot plant a governance configuration because the configuration is not a plaintext JSON file — it's an encrypted blob that the hardware attests to. Even if an attacker writes a file to the expected path, the system rejects it because it lacks a valid attestation chain rooted in hardware.

The agent cannot modify its own governance because the governance is not stored in a medium the agent controls. The hardware holds the key. The hardware attests to the identity. The attestation is witnessed by the network. At no point in this chain does the governed entity — or any unauthorized process on the same machine — have write access to the governance state.

This is the difference between a padlock that the occupant can unlock from inside and a vault whose combination is held by independent witnesses. The former is convenience. The latter is security.

The Principle

Governance must exist in a medium the governed entity cannot unilaterally modify. If the governance is a file and the entity can write files, the governance is a suggestion. If the governance is witnessed behavior attested by independent observers with cryptographic anchoring, the governance is structural.

This applies beyond AI agents. Any system where the monitored entity controls its own monitoring configuration has this vulnerability — from corporate financial auditing (the company selects its own auditor) to political oversight (the governed write the oversight rules) to employee access reviews (the admin reviews their own access). The pattern is universal: when the governed can modify the governance, the governance is theater.

Computable accountability means the accountability infrastructure is not a suggestion the entity can edit. It is a structural property of the environment the entity operates in — witnessed, attested, and independent of the entity's own configuration.

Postscript: Vercel's “Sensitive” Checkbox