Test-time Compute

Test-time compute is the strategy of 'spending more compute when the model answers, in exchange for better answers' — one of the hottest directions in AI for 2026. Instead of training a bigger model, you let the existing model think longer. Chain-of-thought, Best-of-N sampling, and self-consistency all fall under this umbrella. Claude Extended Thinking is the canonical example — Judy AI Lab AI Glossary

core beginner

What is Test-time Compute?

Test-time compute refers to the strategy of “spending more compute when the model answers, in exchange for better answers” — one of the hottest directions in AI heading into 2026.

The classical scaling law says: to get smarter models, train larger ones on more data. After OpenAI o1, the field discovered a second axis: hold model size constant, but let it think longer at inference time, and quality climbs. Techniques include chain-of-thought, best-of-N sampling (generate multiple answers, pick the best), self-consistency, and tree-of-thoughts.

In practice: Claude Extended Thinking is the canonical test-time-compute example. Same Opus weights, but with a thinking budget enabled, accuracy on complex reasoning tasks jumps significantly — at the cost of several times the token usage. The trick is to scale thinking budget by task difficulty, not blindly maximize it.

What is Test-time Compute?#

Related Terms

Get our weekly AI digest:

What is Test-time Compute?