What is Tokenization?

The process of breaking text into small units (tokens) that models can process. English: ~1 word β‰ˆ 1.3 tokens. Chinese: ~1 character β‰ˆ 2-3 tokens. Understanding tokens matters because it directly affects API costs and context window efficiency.