
Ultimate access to all questions.
You have deployed a RAG model for document retrieval and response generation in a customer service application. Over time, you want to monitor if the performance of your model degrades, particularly in terms of its ability to generate useful and accurate responses. Which of the following approaches would be most appropriate for using MLflow to monitor model drift over time?
Explanation:
D. Regularly log BLEU and ROUGE scores on a fixed set of evaluation queries and compare them over time
BLEU (Bilingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are standard metrics for evaluating the quality of generated text in natural language processing tasks.
This approach provides quantitative, reproducible metrics to detect model drift in response generation quality over time.