AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

When an LLM processes the word "unbelievable", it splits it into sub-units before encoding. What are these sub-units called?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:26

Characters

Tokens

Embeddings

Layers

Explanation:

Explanation

In Large Language Models (LLMs), text is broken down into smaller units called tokens before being processed. Tokens can be words, subwords, or even characters, depending on the tokenization method used by the specific model.

Characters (Option A): While some models might use character-level tokenization, most modern LLMs use subword tokenization (like Byte Pair Encoding) where tokens can be parts of words.
Tokens (Option B): CORRECT - These are the fundamental units that LLMs process. For the word "unbelievable", it might be split into tokens like "un", "believe", "able" depending on the tokenizer.
Embeddings (Option C): These are vector representations of tokens, not the sub-units themselves. Tokens are converted into embeddings for processing.
Layers (Option D): These refer to the neural network architecture components, not the text sub-units.

Why this matters for AWS Certified Cloud Practitioner: Understanding how LLMs work is important as AWS offers various AI/ML services like Amazon Bedrock, SageMaker, and Comprehend that utilize tokenization in their natural language processing capabilities.

Powered ByGemini-3 Flash

Comments

Loading comments...