
Answer-first summary for fast verification
Answer: Tokens provide language-independent representation
## Explanation **Correct Answer: A - Tokens provide language-independent representation** **Why tokens are used instead of characters:** 1. **Language Independence**: Tokens represent meaningful units of text rather than individual characters. For example, in English, a token might be a word or subword, while in languages like Chinese, tokens represent characters or character combinations that carry meaning. 2. **Efficiency**: Tokenization allows models to process text more efficiently by breaking it into meaningful units rather than individual characters, which reduces the computational complexity. 3. **Semantic Representation**: Tokens capture semantic meaning better than individual characters. For instance, the word "unbelievable" as a single token carries more meaning than processing each character separately. 4. **Vocabulary Management**: Tokenization helps manage vocabulary size by using subword tokenization techniques (like Byte Pair Encoding) that can handle rare words and out-of-vocabulary terms. **Why the other options are incorrect:** - **B**: Tokens are NOT always shorter than characters - some tokens can be longer than individual characters, especially in languages where single characters represent entire words. - **C**: Characters CAN be encoded numerically (using character encoding like UTF-8), so this statement is false. - **D**: Tokens do NOT reduce context-window cost to zero - they still consume computational resources, though they are more efficient than character-level processing. **Key Takeaway**: Tokenization is fundamental to how modern language models process text, allowing them to work efficiently across different languages while capturing meaningful semantic units.
Author: Ritesh Yadav
Ultimate access to all questions.
Why do AI models count tokens instead of characters?
A
Tokens provide language-independent representation
B
Tokens are always shorter than characters
C
Characters cannot be encoded numerically
D
Tokens reduce context-window cost to zero
No comments yet.