Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


In the context of preparing a dataset for machine learning, you are tasked with evaluating the impact of outliers on the dataset's distribution. The dataset is large, with high dimensionality, and you are particularly interested in quantifying the effect of outliers rather than just identifying them. Given the constraints of computational efficiency and the need for a statistical measure that directly assesses the spread of data points around the mean, which of the following methods would you choose? Additionally, consider the scenario where the dataset contains features with varying scales, and you need a method that is scale-invariant. Choose the best option.