
Answer-first summary for fast verification
Answer: Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime
## Explanation **Correct Answer: B - Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime** **Why this is correct:** 1. **Cost Efficiency**: RAG (Retrieval-Augmented Generation) with Knowledge Bases allows the model to retrieve only relevant information from external knowledge sources at runtime, rather than storing all data in the prompt or fine-tuning for every query. 2. **Performance Maintenance**: By retrieving only relevant context, the model can provide accurate responses without the overhead of processing entire medical records or unnecessary fine-tuning. 3. **AWS Bedrock Integration**: AWS Bedrock's Knowledge Bases feature is specifically designed for RAG implementations, allowing efficient retrieval of relevant information from large datasets. **Why other options are incorrect:** - **A) Use fine-tuning for every patient query**: This would be extremely costly and inefficient. Fine-tuning requires significant computational resources and time for each query, making it impractical for real-time applications. - **C) Store entire medical records in each prompt**: This would lead to very large prompts, increasing token costs significantly and potentially hitting model context window limits. It's inefficient and expensive. - **D) Scale up GPU instances permanently**: This would increase costs without addressing the root cause of inefficiency. Scaling resources permanently is not a cost-reduction strategy. **Key Benefits of RAG with Knowledge Bases:** - Reduces token usage by only including relevant information - Maintains or improves accuracy by providing contextually relevant data - Allows models to access up-to-date information without retraining - More cost-effective than fine-tuning for every use case - Scalable solution for handling large knowledge bases like medical records
Author: Jin H
Ultimate access to all questions.
A hospital research team runs several generative-AI workloads using Bedrock. To reduce cost while maintaining performance, which strategy should they follow?
A
Use fine-tuning for every patient query
B
Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime
C
Store entire medical records in each prompt
D
Scale up GPU instances permanently
No comments yet.