
Answer-first summary for fast verification
Answer: Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime
## Explanation **Option B (Adopt RAG with Knowledge Bases) is the correct answer** because: - **RAG (Retrieval-Augmented Generation)** with Knowledge Bases allows the system to retrieve only relevant medical context at runtime, reducing the amount of data processed in each query - This approach minimizes token usage and computational costs while maintaining high-quality responses - Knowledge Bases can store medical records efficiently and retrieve only the necessary information for each specific patient query **Why other options are incorrect:** - **Option A (Fine-tuning for every patient query)**: This would be extremely expensive and inefficient, as fine-tuning requires significant computational resources and time for each query - **Option C (Store entire medical records in each prompt)**: This would dramatically increase token usage and costs, as large medical records would be processed with every query - **Option D (Scale up GPU instances permanently)**: This would increase costs without addressing the fundamental efficiency issue, as you'd be paying for unused capacity RAG with Knowledge Bases provides the optimal balance of cost efficiency and performance for healthcare AI workloads by retrieving only relevant context when needed.
Author: Ritesh Yadav
Ultimate access to all questions.
Q6. A hospital research team runs several generative-AI workloads using Bedrock. To reduce cost while maintaining performance, which strategy should they follow?
A
Use fine-tuning for every patient query
B
Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime
C
Store entire medical records in each prompt
D
Scale up GPU instances permanently
No comments yet.