AWS Certified Generative AI Developer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Explanation

Option B is correct because it provides a comprehensive solution that addresses all requirements:

Automated quality evaluations at scale: Amazon Bedrock evaluations with Anthropic Claude Sonnet as a judge model enables systematic, repeatable quality assessment across large volumes of interactions. This addresses the need for scalable automated evaluation of factual accuracy and conversational appropriateness.
Compliance enforcement: Amazon Bedrock guardrails provide a dedicated policy enforcement layer that can block or intervene when responses violate financial compliance constraints. This is crucial for financial regulations where automated enforcement is needed.
Targeted human reviews: Amazon Augmented AI (A2I) integrates human review workflows for flagged critical interactions, ensuring human oversight where needed without requiring manual review of all responses.

Why other options are incorrect:

Option A: Relies entirely on manual scoring by financial experts for ALL responses, which doesn't scale and doesn't provide automated quality evaluations at scale.
Option C: Uses Amazon Lex (a chatbot framework) rather than leveraging Amazon Bedrock's evaluation capabilities. The static compliance database approach is less flexible than Bedrock guardrails, and collecting end-user reviews doesn't provide systematic quality evaluation.
Option D: CloudWatch is for monitoring and alerting, not for systematic evaluation of response quality. It lacks the automated evaluation capabilities and compliance enforcement mechanisms provided by Bedrock evaluations and guardrails.

This solution effectively combines AWS's managed GenAI capabilities to meet the requirements of scalable automated evaluation, compliance enforcement, and targeted human oversight.

Explanation:

Explanation

Option B is correct because it provides a comprehensive solution that addresses all requirements:

Automated quality evaluations at scale: Amazon Bedrock evaluations with Anthropic Claude Sonnet as a judge model enables systematic, repeatable quality assessment across large volumes of interactions. This addresses the need for scalable automated evaluation of factual accuracy and conversational appropriateness.
Compliance enforcement: Amazon Bedrock guardrails provide a dedicated policy enforcement layer that can block or intervene when responses violate financial compliance constraints. This is crucial for financial regulations where automated enforcement is needed.
Targeted human reviews: Amazon Augmented AI (A2I) integrates human review workflows for flagged critical interactions, ensuring human oversight where needed without requiring manual review of all responses.

Why other options are incorrect:

Option A: Relies entirely on manual scoring by financial experts for ALL responses, which doesn't scale and doesn't provide automated quality evaluations at scale.
Option C: Uses Amazon Lex (a chatbot framework) rather than leveraging Amazon Bedrock's evaluation capabilities. The static compliance database approach is less flexible than Bedrock guardrails, and collecting end-user reviews doesn't provide systematic quality evaluation.
Option D: CloudWatch is for monitoring and alerting, not for systematic evaluation of response quality. It lacks the automated evaluation capabilities and compliance enforcement mechanisms provided by Bedrock evaluations and guardrails.

This solution effectively combines AWS's managed GenAI capabilities to meet the requirements of scalable automated evaluation, compliance enforcement, and targeted human oversight.

Comments (0)

No comments yet.

A financial technology company is using Amazon Bedrock to build an assessment system for the company's customer service AI assistant. The AI assistant must provide financial recommendations that are factually accurate, compliant with financial regulations, and conversationally appropriate. The company needs to combine automated quality evaluations at scale with targeted human reviews of critical interactions.

What solution will meet these requirements?

Real Exam

Community

DDucse

Last updated: March 23, 2026 at 10:35

Configure a pipeline in which financial experts manually score all responses for accuracy, compliance, and conversational quality. Use Amazon SageMaker notebooks to analyze results to identify improvement areas.

0.0%

Configure Amazon Bedrock evaluations that use Anthropic Claude Sonnet as a judge model to assess response accuracy and appropriateness. Configure custom Amazon Bedrock guardrails to check responses for compliance with financial policies. Add Amazon Augmented AI (Amazon A2I) human reviews for flagged critical interactions.