
Answer-first summary for fast verification
Answer: Tokens
## Explanation In Large Language Models (LLMs), text is processed by breaking it down into smaller units called **tokens**. Tokens can be words, subwords, or even characters, depending on the tokenization strategy used by the model. **Key points:** 1. **Tokenization** is the process of splitting text into tokens 2. For the word "unbelievable", it might be split into tokens like "un", "believ", "able" depending on the tokenizer 3. These tokens are then converted into numerical representations (embeddings) for processing 4. Characters would be individual letters, which is too granular for most LLMs 5. Embeddings are the numerical vector representations of tokens, not the sub-units themselves 6. Layers refer to the neural network architecture, not the text sub-units The correct answer is **B. Tokens** because tokens are the fundamental sub-units that LLMs use to process and understand text.
Author: Jin H
Ultimate access to all questions.
No comments yet.