
Answer-first summary for fast verification
Answer: Tokenization
**Explanation:** Tokenization is the process of breaking down text into smaller units called tokens that a language model can understand. In the context of foundation models like those used in Amazon Bedrock: 1. **Tokenization** converts raw text into a sequence of tokens (words, subwords, or characters) that the model processes. 2. **Stemming** reduces words to their root form (e.g., "running" → "run"), which is a different preprocessing technique. 3. **Vectorization** converts text into numerical vectors, typically for traditional ML models. 4. **Stopword removal** eliminates common words like "the", "and", "is" that may not add significant meaning. For foundation models in Amazon Bedrock, tokenization is the essential first step where input text is converted into tokens that the model's architecture can process.
Author: Jin H
Ultimate access to all questions.
A company is developing a chatbot using Amazon Bedrock. Before sending user input to a foundation model, the text must be broken down into smaller pieces that the model understands. What is this process called?
A
Stemming
B
Tokenization
C
Vectorization
D
Stopword removal
No comments yet.