
Explanation:
Correct Answer: D
The hybrid evaluation system described in option D is the most comprehensive solution that addresses all three requirements:
High accuracy for patient information retrievals: The built-in Amazon Bedrock evaluation can track retrieval precision specifically, which directly addresses the accuracy requirement for patient information.
Identify hallucinations in generated content: The LLM-as-a-judge evaluation combined with targeted human reviews for edge cases provides a robust mechanism to identify hallucinations. The built-in Bedrock evaluation also tracks hallucination rates.
Reduce human review costs: By using automated LLM-as-a-judge evaluation to initially screen responses and only performing targeted human reviews for edge cases, this approach significantly reduces human review costs compared to reviewing all responses.
Why other options are less optimal:
Option A: While Amazon Comprehend can extract medical entities and relationships, it doesn't specifically address hallucination detection or provide a comprehensive evaluation framework for RAG applications. It focuses more on entity recognition rather than overall response quality.
Option B: Automated LLM-based evaluations with a specialized medical model is good, but evaluating ALL responses with this approach could be costly and doesn't incorporate the cost-saving benefit of targeted human reviews for edge cases.
Option C: CloudWatch Synthetics with synthetic test queries is useful for monitoring, but it doesn't provide comprehensive hallucination detection or address the specific needs of clinical decision-making where patient safety is critical. Synthetic tests may not cover all real-world scenarios.
Key AWS Services Mentioned:
This approach aligns with AWS best practices for deploying responsible AI systems in healthcare, where accuracy, safety, and cost-efficiency are all critical considerations.
Ultimate access to all questions.
No comments yet.
A healthcare company is using Amazon Bedrock to build a Retrieval Augmented Generation (RAG) application that helps practitioners make clinical decisions. The application must achieve high accuracy for patient information retrievals, identify hallucinations in generated content, and reduce human review costs. Which solution will meet these requirements?
A
Use Amazon Comprehend to analyze and classify RAG responses and to extract medical entities and relationships. Use AWS Step Functions to orchestrate automated evaluations. Configure Amazon CloudWatch metrics to track entity recognition confidence scores. Configure CloudWatch to send an alert when accuracy falls below specified thresholds.
B
Implement automated large language model (LLM)-based evaluations that use a specialized model that is fine-tuned for medical content to assess all responses. Deploy AWS Lambda functions to parallelize evaluations. Publish results to Amazon CloudWatch metrics that track relevance and factual accuracy.
C
Configure Amazon CloudWatch Synthetics to generate test queries that have known answers on a regular schedule, and track model success rates. Set up dashboards that compare synthetic test results against expected outcomes.
D
Deploy a hybrid evaluation system that uses an automated LLM-as-a-judge evaluation to initially screen responses and targeted human reviews for edge cases. Use a built-in Amazon Bedrock evaluation to track retrieval precision and hallucination rates.