
Ultimate access to all questions.
In your role at a major social network, you're tasked with improving the efficiency of human moderators by developing a Machine Learning model to flag potentially offensive comments for review. The platform processes millions of comments daily, and the model must balance between identifying as many offensive comments as possible (high recall) and minimizing the number of non-offensive comments incorrectly flagged (high precision). Given the constraints of processing power and the need for timely moderation, which metric(s) would best evaluate the model's effectiveness? Choose the two most appropriate options.
A
Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
B
Number of messages flagged by the model per minute, regardless of their nature
C
Number of messages flagged by the model as inappropriate per minute that are confirmed by humans
D
Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
E
Both A and D