2 / 6 · What’s possible

The AI you already have

is capable of much more than it shows you by default.

ARC-AGI-3 · Public reasoning benchmark

Claude Opus 4.6, default

→

94.85%

Same model, structured differently

25 interactive reasoning games. No fine-tuning.

Verify

The difference isn’t the model.

It’s the structure around it.

Web4 is that structure.