
Answer-first summary for fast verification
Answer: Use a combination of automated tools for initial outlier detection and manual review for validation, considering the dataset's distribution, the context of the data, and potential causes of outliers, to balance accuracy with scalability and compliance.
The best approach is to use a combination of automated tools and manual review (Option C). Automated tools can efficiently identify potential outliers in large datasets, addressing scalability concerns. Manual review is then necessary to validate these outliers, ensuring accuracy and compliance with project requirements. This balanced approach considers the dataset's distribution, the context of the data, and potential causes of outliers, making it the most effective method for handling outliers under the given constraints. Options A and D risk removing or ignoring valid data or outliers that could be significant, while Option B is impractical due to the dataset's size and the project's scalability requirements.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As a Microsoft Fabric Analytics Engineer Associate working on a project, you are presented with a dataset that contains potential outliers. The project has strict compliance requirements and the dataset is large, making scalability a concern. You need to ensure the analysis is accurate while adhering to the project constraints. Which of the following approaches is the BEST to identify and resolve issues with data outliers in this scenario? (Choose one)
A
Automatically remove all data points that fall outside a specified range or threshold without further analysis, to ensure scalability and compliance.
B
Manually review each outlier in the dataset to determine its validity, despite the time and resource constraints this may impose.
C
Use a combination of automated tools for initial outlier detection and manual review for validation, considering the dataset's distribution, the context of the data, and potential causes of outliers, to balance accuracy with scalability and compliance.
D
Proceed with the analysis ignoring the outliers, assuming they have minimal impact on the overall results, to meet project deadlines.
No comments yet.