Question: 20 You are working on a Retrieval-Augmented Generation (RAG) application using a large language model (LLM) on Databricks. The cost of inference has increased significantly due to high traffic. You want to use Databricks features to control the costs associated with running the LLM while maintaining reasonable performance for end-users. Which of the following methods would be the BEST way to control LLM costs in your RAG application on Databricks? | Databricks Certified Generative AI Engineer - Associate Quiz - LeetQuiz