Ultimate access to all questions.
You have developed a language understanding model for a virtual assistant that can handle various intents such as 'book_flight', 'check_weather', and 'play_music'. You want to test the model's performance using a set of test data. Which of the following approaches should you use to evaluate the model's accuracy?