
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A company is developing a chatbot using Amazon Bedrock. Before sending user input to a foundation model, the text must be broken down into smaller pieces that the model understands. What is this process called?
A
Stemming
B
Tokenization
C
Vectorization
D
Stopword removal
Explanation:
Tokenization is the correct answer because:
Definition: Tokenization is the process of breaking down text into smaller units called tokens, which can be words, subwords, or characters, depending on the tokenization strategy.
Purpose in NLP: Before feeding text to machine learning models (especially transformer-based models like those used in Amazon Bedrock), the text must be converted into a format the model can understand. Tokenization is the first step in this preprocessing pipeline.
Why not other options:
Stemming (A): This is the process of reducing words to their root form (e.g., "running" → "run"). While it's a text preprocessing technique, it's not specifically about breaking text into pieces for model input.
Vectorization (C): This refers to converting text into numerical vectors (embeddings). While important, it happens after tokenization.
Stopword removal (D): This involves removing common words (like "the", "is", "and") that don't carry significant meaning. It's another preprocessing step but not the specific process described.
Amazon Bedrock Context: Amazon Bedrock is a fully managed service that makes foundation models from various AI companies available through a single API. When using these models, text input must be tokenized according to the specific model's tokenization scheme before being processed.
Key Takeaway: Tokenization is fundamental to how modern language models process text, converting raw text into discrete units that can be mapped to the model's vocabulary and processed through its neural network architecture.