
Explanation:
Correct Answer: A - Tokens provide language-independent representation
Why tokens are used instead of characters:
Language Independence: Tokens represent meaningful units of text rather than individual characters. For example, in English, a token might be a word or subword, while in languages like Chinese, tokens represent characters or character combinations that carry meaning.
Efficiency: Tokenization allows models to process text more efficiently by breaking it into meaningful units rather than individual characters, which reduces the computational complexity.
Semantic Representation: Tokens capture semantic meaning better than individual characters. For instance, the word "unbelievable" as a single token carries more meaning than processing each character separately.
Vocabulary Management: Tokenization helps manage vocabulary size by using subword tokenization techniques (like Byte Pair Encoding) that can handle rare words and out-of-vocabulary terms.
Why the other options are incorrect:
Key Takeaway: Tokenization is fundamental to how modern language models process text, allowing them to work efficiently across different languages while capturing meaningful semantic units.
Ultimate access to all questions.
Why do AI models count tokens instead of characters?
A
Tokens provide language-independent representation
B
Tokens are always shorter than characters
C
Characters cannot be encoded numerically
D
Tokens reduce context-window cost to zero
No comments yet.