
Answer-first summary for fast verification
Answer: Impute using the most common value (or mode).
The most suitable imputation strategy for a categorical variable such as 'happiness_tier' is to use the mode, which is the most frequent value in the dataset. This approach is preferred because: - **Categorical variables** represent classifications that don't have a numerical meaning, making methods like mean or median inappropriate. - **Mode** ensures that the imputed value is a valid category, maintaining the integrity of the data. - **No imputation** can lead to issues in analyses that require complete datasets, while **mean/median** imputation could introduce nonsensical values into categorical data. Thus, imputing missing values in categorical columns with the mode is the most effective strategy.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When dealing with missing values in a dataset that includes a categorical variable like 'happiness_tier', what is the most appropriate imputation strategy?
A
No imputation is necessary for categorical variables.
B
Impute using the median.
C
Impute using the mean.
D
Impute using the most common value (or mode).
No comments yet.