When AI Stopped Asking for Permission

There’s a moment in any technology cycle when the expensive thing becomes the everyday thing. When the mainframe becomes the PC. When the PC becomes the phone. When the phone becomes the agent that does your work while you sleep.

That moment happened again this week. Quietly. The way the important ones usually do.

Anthropic launched Claude Sonnet 4.6 on Tuesday, February 17th. Same price as its predecessor. Same tier. Dramatically different capability.

Here’s what that actually means.

The line between “good enough” and “flagship” just blurred.

For the past year, AI power users operated on a simple premise: if it matters, use the big model. Opus handles the hard stuff. Sonnet handles the rest. That distinction made sense — and it made budget sense — because the gap was real.

Sonnet 4.6 closes it.

Early users preferred Sonnet 4.6 over the previous Opus model 59% of the time in head-to-head coding evaluations. Not “almost as good.” Preferred. Databricks says it matches Opus 4.6 on enterprise document comprehension. GitHub reports strong resolution rates on complex, large-codebase fixes. Box saw a 15-point improvement in heavy reasoning Q&A over the prior Sonnet.

The premium model is no longer where the line is drawn. It’s where you go when you’re doing codebase refactoring or coordinating multiple agents simultaneously. Everything else? Sonnet 4.6 handles it. At $3 per million input tokens.

The model now uses computers the way people do.

Anthropic introduced computer use in October 2024 and called it “experimental — at times cumbersome and error-prone.” Sixteen months later, Sonnet 4.6 is hitting 94% accuracy on insurance workflows like submission intake and first notice of loss. It navigates complex spreadsheets. It fills out multi-step web forms. It moves across browser tabs the way an efficient human would — clicking, reading, deciding.

Almost every organization has software that predates modern APIs. Legacy systems. Specialized tools nobody wants to rebuild. For years, automating those systems meant expensive custom integrations. A model that can simply use a computer changes that math entirely.

The security risk is real — bad actors can attempt prompt injection attacks through malicious website content — and Anthropic has substantially hardened Sonnet 4.6 against them. It performs comparably to Opus 4.6 on those defenses. Worth knowing, worth watching.

One million tokens. And it actually reasons across all of them.

Context windows used to be a parlor trick metric. Bigger numbers, limited practical value, because models would lose the thread long before the edges.

Sonnet 4.6’s 1M token context window is in beta — but early testing shows it reasoning coherently across the entire span. Entire codebases. Lengthy contracts. Dozens of research papers. In a simulated business competition (Vending-Bench Arena, if you’re keeping score), the model developed a genuine strategic approach: invest heavily in the first ten months, then pivot hard to profitability. It won.

That’s not pattern matching. That’s planning.

What this means for the work we do.

Every client conversation about AI strategy used to include a budget conversation. How much compute? Which model tier? What’s the ROI on the smarter option?

That conversation just got simpler. The capable option is now the affordable option. The free tier at claude.ai now runs Sonnet 4.6 by default, with file creation, connectors, and memory included. The developer API is available now. Amazon Bedrock has it. Vertex AI has it.

The question has shifted. It’s no longer can we afford the good model? It’s what are we building with it?

That’s the question we should be asking every client this week.