Token

A Token is the smallest unit an LLM uses to process text — could be a character, sub-word, common word, or punctuation. Models bill, count length, and measure speed in tokens, not in characters or words. Rough estimate: in English 1 token ≈ 0.75 words; in Chinese 1 character ≈ 1-2 tokens. Understanding tokens is lesson one in AI development — Judy AI Lab AI Glossary

core beginner

What is a Token?

A Token is the smallest unit an LLM uses to process text — it could be a single character, a sub-word fragment, a common word, or punctuation. Models bill, count length, and measure speed in tokens, not in characters or words. Anthropic, OpenAI, and Google API pricing is all per-token.

Rough conversions:

English: 1 token ≈ 0.75 words
Chinese: 1 character ≈ 1-2 tokens (varies by model)
Code: typically 5-15 tokens per line

Why it matters: estimating API costs, context window usage, and response speed all require thinking in tokens. Claude Opus 4.x’s 1M context window fits roughly 750K English words or 500K Chinese characters. A typical 3000-character Chinese blog post on Judy AI Lab is around 4500 tokens; the entire 75-term glossary is roughly 80K tokens. These numbers decide how you design your RAG system and prompt structure.

What is a Token?#

Related Terms

Get our weekly AI digest:

What is a Token?