
Answer-first summary for fast verification
Answer: Replace using Probabilistic PCA
The question specifies that 'the data does not require the application of predictors for each column,' which eliminates MICE (Multiple Imputation by Chained Equations) as it uses predictive models (regression) to impute missing values based on relationships between columns. Probabilistic PCA (PPCA) is optimal here because it handles missing values through dimensionality reduction without requiring predictors for each column, making it suitable for small datasets with many missing columns. Community discussion shows mixed opinions, but the highest upvoted comments (3 upvotes) support PPCA, and the consensus from detailed reasoning aligns with PPCA being the correct choice given the constraint against using predictors. Other options like Normalization and SMOTE are not primarily for missing data imputation.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset with missing values in many columns. Applying predictors for each column is not required. You plan to use the Clean Missing Data module.
Which data cleaning method should you select?
A
Replace using Probabilistic PCA
B
Normalization
C
Synthetic Minority Oversampling Technique (SMOTE)
D
Replace using MICE
No comments yet.