
Answer-first summary for fast verification
Answer: Massive Multi-task Language Understanding (MMLU) score
The question asks which metric is NOT appropriate for monitoring a deployed LLM application in production. MMLU (Massive Multi-task Language Understanding) score is a benchmarking metric used during model development and evaluation, not for ongoing production monitoring. In contrast, metrics like number of customer inquiries processed per unit of time (throughput), factual accuracy of responses (quality), and response time (performance) are all relevant for monitoring a production customer service application. The community discussion confirms this, with 100% consensus on A and an explanation that MMLU is for pre-training and evaluation, not production monitoring.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A Generative AI Engineer has deployed an LLM application at a manufacturing company to assist with customer service inquiries. They need to identify the key enterprise metrics for monitoring the application in production.
Which of the following is NOT a metric they would implement for their customer service LLM application in production?
A
Massive Multi-task Language Understanding (MMLU) score
B
Number of customer inquiries processed per unit of time
C
Factual accuracy of the response
D
Time taken for LLM to generate a response
No comments yet.