AWS Certified Generative AI Developer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Explanation

Correct Answer: D

The hybrid evaluation system described in option D is the most comprehensive solution that addresses all three requirements:

High accuracy for patient information retrievals: The built-in Amazon Bedrock evaluation can track retrieval precision specifically, which directly addresses the accuracy requirement for patient information.
Identify hallucinations in generated content: The LLM-as-a-judge evaluation combined with targeted human reviews for edge cases provides a robust mechanism to identify hallucinations. The built-in Bedrock evaluation also tracks hallucination rates.
Reduce human review costs: By using automated LLM-as-a-judge evaluation to initially screen responses and only performing targeted human reviews for edge cases, this approach significantly reduces human review costs compared to reviewing all responses.

Why other options are less optimal:

Option A: While Amazon Comprehend can extract medical entities and relationships, it doesn't specifically address hallucination detection or provide a comprehensive evaluation framework for RAG applications. It focuses more on entity recognition rather than overall response quality.
Option B: Automated LLM-based evaluations with a specialized medical model is good, but evaluating ALL responses with this approach could be costly and doesn't incorporate the cost-saving benefit of targeted human reviews for edge cases.
Option C: CloudWatch Synthetics with synthetic test queries is useful for monitoring, but it doesn't provide comprehensive hallucination detection or address the specific needs of clinical decision-making where patient safety is critical. Synthetic tests may not cover all real-world scenarios.

Key AWS Services Mentioned:

Amazon Bedrock: Provides foundation models and built-in evaluation capabilities
LLM-as-a-judge: A common pattern where one LLM evaluates the outputs of another
Hybrid evaluation: Combines automated evaluation with targeted human review for optimal balance of accuracy and cost

This approach aligns with AWS best practices for deploying responsible AI systems in healthcare, where accuracy, safety, and cost-efficiency are all critical considerations.

Explanation:

Explanation

Correct Answer: D

The hybrid evaluation system described in option D is the most comprehensive solution that addresses all three requirements:

High accuracy for patient information retrievals: The built-in Amazon Bedrock evaluation can track retrieval precision specifically, which directly addresses the accuracy requirement for patient information.
Identify hallucinations in generated content: The LLM-as-a-judge evaluation combined with targeted human reviews for edge cases provides a robust mechanism to identify hallucinations. The built-in Bedrock evaluation also tracks hallucination rates.
Reduce human review costs: By using automated LLM-as-a-judge evaluation to initially screen responses and only performing targeted human reviews for edge cases, this approach significantly reduces human review costs compared to reviewing all responses.

Why other options are less optimal:

Option A: While Amazon Comprehend can extract medical entities and relationships, it doesn't specifically address hallucination detection or provide a comprehensive evaluation framework for RAG applications. It focuses more on entity recognition rather than overall response quality.
Option B: Automated LLM-based evaluations with a specialized medical model is good, but evaluating ALL responses with this approach could be costly and doesn't incorporate the cost-saving benefit of targeted human reviews for edge cases.
Option C: CloudWatch Synthetics with synthetic test queries is useful for monitoring, but it doesn't provide comprehensive hallucination detection or address the specific needs of clinical decision-making where patient safety is critical. Synthetic tests may not cover all real-world scenarios.

Key AWS Services Mentioned:

Amazon Bedrock: Provides foundation models and built-in evaluation capabilities
LLM-as-a-judge: A common pattern where one LLM evaluates the outputs of another
Hybrid evaluation: Combines automated evaluation with targeted human review for optimal balance of accuracy and cost

This approach aligns with AWS best practices for deploying responsible AI systems in healthcare, where accuracy, safety, and cost-efficiency are all critical considerations.

Comments (0)

No comments yet.

A healthcare company is using Amazon Bedrock to build a Retrieval Augmented Generation (RAG) application that helps practitioners make clinical decisions. The application must achieve high accuracy for patient information retrievals, identify hallucinations in generated content, and reduce human review costs. Which solution will meet these requirements?

Real Exam

Community

DDucse

Last updated: March 23, 2026 at 10:35

Use Amazon Comprehend to analyze and classify RAG responses and to extract medical entities and relationships. Use AWS Step Functions to orchestrate automated evaluations. Configure Amazon CloudWatch metrics to track entity recognition confidence scores. Configure CloudWatch to send an alert when accuracy falls below specified thresholds.

28.6%

Implement automated large language model (LLM)-based evaluations that use a specialized model that is fine-tuned for medical content to assess all responses. Deploy AWS Lambda functions to parallelize evaluations. Publish results to Amazon CloudWatch metrics that track relevance and factual accuracy.