
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
When an LLM processes the word "unbelievable", it splits it into sub-units before encoding. What are these sub-units called?
A
Characters
B
Tokens
C
Embeddings
D
Layers
Explanation:
In Large Language Models (LLMs), text is broken down into smaller units called tokens before being processed. Tokens can be words, subwords, or even characters, depending on the tokenization method used by the specific model.
Characters (Option A): While some models might use character-level tokenization, most modern LLMs use subword tokenization (like Byte Pair Encoding) where tokens can be parts of words.
Tokens (Option B): CORRECT - These are the fundamental units that LLMs process. For the word "unbelievable", it might be split into tokens like "un", "believe", "able" depending on the tokenizer.
Embeddings (Option C): These are vector representations of tokens, not the sub-units themselves. Tokens are converted into embeddings for processing.
Layers (Option D): These refer to the neural network architecture components, not the text sub-units.
Why this matters for AWS Certified Cloud Practitioner: Understanding how LLMs work is important as AWS offers various AI/ML services like Amazon Bedrock, SageMaker, and Comprehend that utilize tokenization in their natural language processing capabilities.