What is Test-time Compute?

Test-time compute refers to the strategy of “spending more compute when the model answers, in exchange for better answers” — one of the hottest directions in AI heading into 2026.

The classical scaling law says: to get smarter models, train larger ones on more data. After OpenAI o1, the field discovered a second axis: hold model size constant, but let it think longer at inference time, and quality climbs. Techniques include chain-of-thought, best-of-N sampling (generate multiple answers, pick the best), self-consistency, and tree-of-thoughts.

In practice: Claude Extended Thinking is the canonical test-time-compute example. Same Opus weights, but with a thinking budget enabled, accuracy on complex reasoning tasks jumps significantly — at the cost of several times the token usage. The trick is to scale thinking budget by task difficulty, not blindly maximize it.