
Microsoft Certified Azure AI Engineer Associate - AI-102
Get started today
Ultimate access to all questions.
You have developed a language understanding model for a virtual assistant that can handle various intents such as 'play_music', 'set_alarm', and 'get_weather'. You want to test the model's performance using a set of test data. Which of the following evaluation strategies would be most appropriate for this scenario?
You have developed a language understanding model for a virtual assistant that can handle various intents such as 'play_music', 'set_alarm', and 'get_weather'. You want to test the model's performance using a set of test data. Which of the following evaluation strategies would be most appropriate for this scenario?
Explanation:
Measuring the model's performance using a combination of precision, recall, and F1-score metrics, along with analyzing the confusion matrix, provides a comprehensive evaluation of its accuracy and helps identify areas for improvement. Option A is time-consuming and may not provide a complete assessment of the model's performance. Option B is a basic evaluation strategy but does not provide insights into the model's strengths and weaknesses. Option D is useful for real-world testing but may not be feasible during the initial evaluation phase.