
Answer-first summary for fast verification
Answer: Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime
## Explanation **Correct Answer: B - Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime** **Why this is correct:** 1. **RAG (Retrieval-Augmented Generation)** with Knowledge Bases allows the model to retrieve only relevant information from a knowledge base at runtime, rather than processing all data in every query. 2. **Cost Reduction**: By retrieving only relevant context, you reduce the amount of data processed per query, which lowers computational costs. 3. **Performance Maintenance**: RAG maintains or improves performance by providing the model with precise, relevant information rather than overwhelming it with unnecessary data. 4. **AWS Bedrock Integration**: AWS Bedrock supports Knowledge Bases for Amazon Bedrock, which is specifically designed for RAG implementations. **Why other options are incorrect:** - **A) Use fine-tuning for every patient query**: Fine-tuning is expensive and time-consuming for every query, and doesn't scale well for dynamic patient data. - **C) Store entire medical records in each prompt**: This would increase token usage dramatically, leading to higher costs and potentially worse performance due to context window limitations. - **D) Scale up GPU instances permanently**: This would increase costs without addressing the root cause of inefficiency and doesn't optimize for cost reduction. **Key AWS Concepts:** - **AWS Bedrock Knowledge Bases**: Provide managed RAG capabilities with vector storage and retrieval - **Cost Optimization**: Reducing token usage and computational requirements - **Performance Efficiency**: Balancing cost with response quality and speed
Author: Ritesh Yadav
Ultimate access to all questions.
A hospital research team runs several generative-AI workloads using Bedrock. To reduce cost while maintaining performance, which strategy should they follow?
A
Use fine-tuning for every patient query
B
Adopt RAG with Knowledge Bases to retrieve only relevant context at runtime
C
Store entire medical records in each prompt
D
Scale up GPU instances permanently
No comments yet.