Explanation
Option B (Adopt RAG with Knowledge Bases) is the correct answer because:
- RAG (Retrieval-Augmented Generation) with Knowledge Bases allows the system to retrieve only relevant medical context at runtime, reducing the amount of data processed in each query
- This approach minimizes token usage and computational costs while maintaining high-quality responses
- Knowledge Bases can store medical records efficiently and retrieve only the necessary information for each specific patient query
Why other options are incorrect:
- Option A (Fine-tuning for every patient query): This would be extremely expensive and inefficient, as fine-tuning requires significant computational resources and time for each query
- Option C (Store entire medical records in each prompt): This would dramatically increase token usage and costs, as large medical records would be processed with every query
- Option D (Scale up GPU instances permanently): This would increase costs without addressing the fundamental efficiency issue, as you'd be paying for unused capacity
RAG with Knowledge Bases provides the optimal balance of cost efficiency and performance for healthcare AI workloads by retrieving only relevant context when needed.