
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Why do AI models count tokens instead of characters?
A
Tokens provide language-independent representation
B
Tokens are always shorter than characters
C
Characters cannot be encoded numerically
D
Tokens reduce context-window cost to zero
Explanation:
Correct Answer: A - Tokens provide language-independent representation
Why tokens are used instead of characters:
Language Independence: Tokens represent meaningful units of text rather than individual characters. For example, in English, a token might be a word or subword, while in languages like Chinese, tokens represent characters or character combinations that carry meaning.
Efficiency: Tokenization allows models to process text more efficiently by breaking it into meaningful units rather than individual characters, which reduces the computational complexity.
Semantic Representation: Tokens capture semantic meaning better than individual characters. For instance, the word "unbelievable" as a single token carries more meaning than processing each character separately.
Vocabulary Management: Tokenization helps manage vocabulary size by using subword tokenization techniques (like Byte Pair Encoding) that can handle rare words and out-of-vocabulary terms.
Why the other options are incorrect:
B: Tokens are NOT always shorter than characters - some tokens can be longer than individual characters, especially in languages where single characters represent entire words.
C: Characters CAN be encoded numerically (using character encoding like UTF-8), so this statement is false.
D: Tokens do NOT reduce context-window cost to zero - they still consume computational resources, though they are more efficient than character-level processing.
Key Takeaway: Tokenization is fundamental to how modern language models process text, allowing them to work efficiently across different languages while capturing meaningful semantic units.