
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
When an LLM processes the word "unbelievable", it splits it into sub-units before encoding. What are these sub-units called?
A
Characters
B
Tokens
C
Embeddings
D
Layers
Explanation:
In Large Language Models (LLMs), when processing text like the word "unbelievable", the model first splits the input into smaller sub-units called tokens.
Tokens are the fundamental units of text that LLMs process
Tokenization is the process of breaking down text into these smaller units
For the word "unbelievable", it might be split into tokens like "un", "believe", "able" or similar sub-word units depending on the tokenizer
Characters would be individual letters (u, n, b, e, l, i, e, v, a, b, l, e)
Embeddings are the numerical representations of tokens, not the tokens themselves
Layers refer to the neural network architecture components, not the input units
This tokenization process allows LLMs to handle vocabulary efficiently and process text in manageable pieces.