
Answer-first summary for fast verification
Answer: Tokens
## Explanation In Large Language Models (LLMs), text is broken down into smaller units called **tokens** before being processed. Tokens can be words, subwords, or even characters, depending on the tokenization method used by the specific model. - **Characters** (Option A): While some models might use character-level tokenization, most modern LLMs use subword tokenization (like Byte Pair Encoding) where tokens can be parts of words. - **Tokens** (Option B): **CORRECT** - These are the fundamental units that LLMs process. For the word "unbelievable", it might be split into tokens like "un", "believe", "able" depending on the tokenizer. - **Embeddings** (Option C): These are vector representations of tokens, not the sub-units themselves. Tokens are converted into embeddings for processing. - **Layers** (Option D): These refer to the neural network architecture components, not the text sub-units. **Why this matters for AWS Certified Cloud Practitioner**: Understanding how LLMs work is important as AWS offers various AI/ML services like Amazon Bedrock, SageMaker, and Comprehend that utilize tokenization in their natural language processing capabilities.
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.