
Answer-first summary for fast verification
Answer: Tokenization
## Explanation **Tokenization** is the correct answer because: - **Tokenization** is the process of breaking down text into smaller units called tokens, which are the basic building blocks that language models can understand and process - In the context of Amazon Bedrock and foundation models, text input needs to be converted into tokens before being processed by the model - Tokens can be words, subwords, or even individual characters, depending on the tokenization method used by the specific foundation model **Why the other options are incorrect:** - **Stemming (A)**: This is a text preprocessing technique that reduces words to their root form (e.g., "running" → "run"), but it's not the process of breaking text into model-understandable pieces - **Vectorization (C)**: This refers to converting text into numerical vectors, which typically happens after tokenization in the NLP pipeline - **Stopword removal (D)**: This involves removing common words like "the", "and", "is" that don't carry significant meaning, but it's not the fundamental process of breaking text into model-understandable units In Amazon Bedrock's workflow, tokenization is a critical preprocessing step that prepares text data for the foundation model by converting it into the token format the model was trained on.
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.
A company is developing a chatbot using Amazon Bedrock. Before sending user input to a foundation model, the text must be broken down into smaller pieces that the model understands. What is this process called?
A
Stemming
B
Tokenization
C
Vectorization
D
Stopword removal