What is Tokenization?
The process of breaking text into small units (tokens) that models can process. English: ~1 word β 1.3 tokens. Chinese: ~1 character β 2-3 tokens. Understanding tokens matters because it directly affects API costs and context window efficiency.