
Answer-first summary for fast verification
Answer: Tokens provide language-independent representation
## Explanation AI models count tokens instead of characters because: 1. **Language Independence**: Tokens provide a language-independent representation that works across different languages and scripts. Characters can vary significantly between languages, making them less consistent for processing. 2. **Semantic Units**: Tokens represent meaningful semantic units (words, subwords, or characters) rather than just individual characters. This allows models to better understand the structure and meaning of text. 3. **Efficiency**: Tokenization helps break down text into manageable units that can be processed more efficiently by neural networks. 4. **Vocabulary Management**: Tokenization allows models to handle large vocabularies by breaking rare words into subword units, reducing the vocabulary size while maintaining the ability to represent any word. **Why the other options are incorrect:** - **B**: Tokens are not always shorter than characters - some tokens can represent multiple characters or even whole words. - **C**: Characters can indeed be encoded numerically (e.g., using ASCII or Unicode), so this is not a valid reason. - **D**: Tokens do not reduce context-window cost to zero - they still consume computational resources, though they may be more efficient than character-level processing. The correct answer is A because tokenization provides a language-independent way to represent text that works consistently across different languages and scripts.
Author: Jin H
Ultimate access to all questions.
No comments yet.
Q3 – Why do AI models count tokens instead of characters?
A
Tokens provide language-independent representation
B
Tokens are always shorter than characters
C
Characters cannot be encoded numerically
D
Tokens reduce context-window cost to zero