
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Question: 17 You are working with a Retrieval-Augmented Generation (RAG) application that uses a large language model (LLM) to generate responses. The cost of running this application is increasing due to high usage of the LLM for inference. What is the most effective way to use Databricks features to control costs without compromising the quality of responses?
A
Use model checkpointing to avoid retraining the LLM from scratch for each query
B
Employ prompt optimization techniques and cache common query results in Databricks
C
Use the Databricks autoscaling feature to scale compute clusters based on LLM load
D
Decrease the number of tokens used for generation by reducing the max tokens parameter in the LLM
Explanation:
Option B is the correct answer because it directly addresses the cost issue caused by high LLM inference usage while maintaining response quality.
This solution leverages Databricks' capabilities for caching and prompt management to achieve significant cost reductions while preserving the application's effectiveness.