LLMs consume tokens—subword pieces from a vocabulary—not whole words. Billing, context limits, and prompt sizing are all token-based.
Examples
# Illustrative — real counts come from the model tokenizer
text = "unbelievable"
# might split into ["un", "believ", "able"] depending on tokenizer
Why subwords
- Handles rare words without million-entry dictionaries
- Shares morphemes across languages
- Code and JSON benefit from character-level pieces
Practical rules
Use the provider's tiktoken or API token counter before production. English averages ~4 characters per token; code and non-Latin scripts differ.
Truncation strategy: drop oldest chat turns, summarize history, or retrieve only top-k chunks—not silent mid-word cuts.
Important interview questions and answers
- Q: Why isn't one word always one token?
A: Subword tokenization splits rare or compound strings.
Self-check
- What unit do providers bill on?
- Why measure prompts before launch?
Pitfall: Pricing surprises—count tokens on longest realistic prompt before budgeting.
Interview prep
- Why subwords?
Compact vocabulary handling rare words, morphology, and code fragments.
- Billing unit?
Providers bill tokens for prompt + completion—measure before launch.