
Answer-first summary for fast verification
Answer: Imputation using the mean or median for numerical features and mode for categorical features, Deleting rows or columns with missing values to ensure data completeness
**Correct Options: C. Imputation and E. Deleting rows or columns with missing values** Imputation is the MOST appropriate method for addressing missing data in a large dataset with randomly distributed missing values, as it preserves all data points and is cost-efficient. Mean or median imputation for numerical features and mode imputation for categorical features are scalable and maintain the dataset's integrity. Deleting rows or columns with missing values is also a viable option when the missing data is not significant, ensuring data completeness but potentially losing valuable information. **Why other options are incorrect:** - **A. Data encryption**: While important for data security, it does not address the issue of missing values. - **B. Data scaling**: Adjusts the range of features but does not handle missing data. - **D. Data normalization**: Similar to scaling, it standardizes the scale of features without addressing missing values.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of preparing a dataset for machine learning, you are conducting Exploratory Data Analysis (EDA) and encounter missing values across several features. The dataset is large, and the missing values are randomly distributed. Cost efficiency and scalability are key considerations for your project. Which of the following methods is MOST appropriate for addressing missing data under these constraints? Choose the BEST option.
A
Data encryption to secure the dataset before further processing
B
Data scaling to normalize the range of all features
C
Imputation using the mean or median for numerical features and mode for categorical features
D
Data normalization to adjust the scale of features without considering missing values
E
Deleting rows or columns with missing values to ensure data completeness