Databricks Certified Generative AI Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

You have deployed a RAG model for document retrieval and response generation in a customer service application. Over time, you want to monitor if the performance of your model degrades, particularly in terms of its ability to generate useful and accurate responses. Which of the following approaches would be most appropriate for using MLflow to monitor model drift over time?

Real Exam

Community

LLeetQuiz

Monitor the accuracy of the retrieval step over time

Track the number of queries processed by the model daily

Monitor the change in the learning rate and number of training epochs used in fine-tuning the model

Regularly log BLEU and ROUGE scores on a fixed set of evaluation queries and compare them over time

Explanation:

Explanation

D. Regularly log BLEU and ROUGE scores on a fixed set of evaluation queries and compare them over time

BLEU (Bilingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are standard metrics for evaluating the quality of generated text in natural language processing tasks.

Why this approach works:

BLEU measures the precision of n-grams in generated text compared to reference text
ROUGE measures recall by comparing overlapping n-grams, word sequences, and word pairs
Using a fixed set of evaluation queries ensures consistent comparison over time
Regular logging in MLflow allows tracking performance trends and detecting degradation
This directly addresses the concern about "generating useful and accurate responses"

Why not the other options:

A: Monitoring retrieval accuracy only assesses the document retrieval component, not the response generation quality
B: Tracking query volume monitors usage patterns but doesn't measure performance quality
C: Learning rate and training epochs are training hyperparameters, not production performance metrics

This approach provides quantitative, reproducible metrics to detect model drift in response generation quality over time.

Powered ByGPT-5.2

Comments

Loading comments...