AWS Certified Generative AI Developer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Explanation

Option B is correct because it uses Amazon Bedrock's managed evaluation capability specifically designed for comparing foundation models. Here's why:

Key Components of the Correct Solution:

JSONL Format: JSON Lines format is the standard for evaluation datasets in Amazon Bedrock, where each line represents a separate evaluation record with prompts.
Amazon S3 Storage: Storing the JSONL file in S3 allows Bedrock to read the dataset at scale and write evaluation outputs back to S3.
Bedrock Evaluation Job: This is the core service for automated model evaluation with:
- Judge Model: Evaluates the quality and safety of responses from candidate models
- Generator Model: The FM being evaluated (selected as the generator in each job run)
- Structured Output: Produces standardized evaluation results in S3

Why Other Options Are Incorrect:

Option A: Uses knowledge base and RetrieveAndGenerate API, which is for RAG applications, not systematic model evaluation.
Option C: Incorrectly specifies QuickSight as output (Bedrock evaluation jobs output to S3, not directly to QuickSight).
Option D: Uses knowledge base approach and incorrect evaluation type (retrieval and response generation is for RAG evaluation, not general FM comparison).

Benefits of This Approach:

Consistent Evaluation: Same judge model and scoring rubric applied to all candidate FMs
Automated Reporting: Structured results generated automatically
Scalability: Can evaluate multiple models efficiently
Reproducibility: Evaluation jobs can be rerun with same parameters
Quality & Safety Assessment: Judge model can evaluate both dimensions systematically

Explanation:

Explanation

Option B is correct because it uses Amazon Bedrock's managed evaluation capability specifically designed for comparing foundation models. Here's why:

Key Components of the Correct Solution:

JSONL Format: JSON Lines format is the standard for evaluation datasets in Amazon Bedrock, where each line represents a separate evaluation record with prompts.
Amazon S3 Storage: Storing the JSONL file in S3 allows Bedrock to read the dataset at scale and write evaluation outputs back to S3.
Bedrock Evaluation Job: This is the core service for automated model evaluation with:
- Judge Model: Evaluates the quality and safety of responses from candidate models
- Generator Model: The FM being evaluated (selected as the generator in each job run)
- Structured Output: Produces standardized evaluation results in S3

Why Other Options Are Incorrect:

Option A: Uses knowledge base and RetrieveAndGenerate API, which is for RAG applications, not systematic model evaluation.
Option C: Incorrectly specifies QuickSight as output (Bedrock evaluation jobs output to S3, not directly to QuickSight).
Option D: Uses knowledge base approach and incorrect evaluation type (retrieval and response generation is for RAG evaluation, not general FM comparison).

Benefits of This Approach:

Consistent Evaluation: Same judge model and scoring rubric applied to all candidate FMs
Automated Reporting: Structured results generated automatically
Scalability: Can evaluate multiple models efficiently
Reproducibility: Evaluation jobs can be rerun with same parameters
Quality & Safety Assessment: Judge model can evaluate both dimensions systematically

Comments (0)

No comments yet.

A company wants to select a new FM for its AI assistant. A GenAI developer needs to generate evaluation reports to help a data scientist assess the quality and safety of various foundation models FMs. The data scientist provides the GenAI developer with sample prompts for evaluation. The GenAI developer wants to use Amazon Bedrock to automate report generation and evaluation.

Which solution will meet this requirement?

Real Exam

Community

DDucse

Last updated: June 16, 2026 at 14:02

Combine the sample prompts into a single JSON document. Create an Amazon Bedrock knowledge base with the document. Write a prompt that asks the FM to generate a response to each sample prompt. Use the RetrieveAndGenerate API to generate a report for each model.

7.7%

Combine the sample prompts into a single JSONL document. Store the document in an Amazon S3 bucket. Create an Amazon Bedrock evaluation job that uses a judge model. Specify the S3 location as input and a different S3 location as output. Run an evaluation job for each FM and select the FM as the generator.

Combine the sample prompts into a single JSONL document. Store the document in an Amazon S3 bucket. Create an Amazon Bedrock evaluation job that uses a judge model. Specify the S3 location as input and Amazon QuickSight as output. Run an evaluation job for each FM and select the FM as the evaluator.

7.7%

Combine the sample prompts into a single JSON document. Create an Amazon Bedrock knowledge base from the document. Create an Amazon Bedrock evaluation job that uses the retrieval and response generation evaluation type. Specify an Amazon S3 bucket as the output. Run an evaluation job for each FM.

7.7%