
Ultimate access to all questions.
You have developed a language understanding model for a virtual assistant that can handle various intents such as 'play_music', 'set_alarm', and 'get_weather'. You want to test the model's performance using a set of test data. Which of the following evaluation strategies would be most appropriate for this scenario?_
A
Manually review the model's predictions for each test sample and provide feedback on its accuracy.
B
Use a simple accuracy metric to calculate the percentage of correct predictions out of the total test samples.
C
Measure the model's performance using a combination of precision, recall, and F1-score metrics, and analyze the confusion matrix to identify areas for improvement.
D
Test the model's performance by deploying it in a production environment and collecting user feedback.