
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A company is developing a chatbot using Amazon Bedrock. Before sending user input to a foundation model, the text must be broken down into smaller pieces that the model understands. What is this process called?
A
Stemming
B
Tokenization
C
Vectorization
D
Stopword removal
Explanation:
Tokenization is the correct answer because:
Tokenization is the process of breaking down text into smaller units called tokens, which are the basic building blocks that language models can understand and process
In the context of Amazon Bedrock and foundation models, text input needs to be converted into tokens before being processed by the model
Tokens can be words, subwords, or even individual characters, depending on the tokenization method used by the specific foundation model
Why the other options are incorrect:
Stemming (A): This is a text preprocessing technique that reduces words to their root form (e.g., "running" → "run"), but it's not the process of breaking text into model-understandable pieces
Vectorization (C): This refers to converting text into numerical vectors, which typically happens after tokenization in the NLP pipeline
Stopword removal (D): This involves removing common words like "the", "and", "is" that don't carry significant meaning, but it's not the fundamental process of breaking text into model-understandable units
In Amazon Bedrock's workflow, tokenization is a critical preprocessing step that prepares text data for the foundation model by converting it into the token format the model was trained on.