
Answer-first summary for fast verification
Answer: happiness_tier
The correct answer is `happiness_tier` because it is a categorical variable, making the mode (most common value) the appropriate choice for imputation. Numeric features such as `spend` and `units` are typically imputed using the mean or median. Imputing `customer_id` is not logical since it serves as a unique identifier for each customer.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data scientist is working with a feature set that includes the following schema: customer_id STRING, spend DOUBLE, units INTEGER, happiness_tier STRING. The customer_id column serves as the primary key. Each column in the feature set contains some missing values. The scientist plans to replace these missing values by imputing a common value for each feature. Which columns from the feature set are best suited for imputation using the most common value (mode) of the column?
A
customer_id and happiness_tier
B
spend
C
units
D
happiness_tier
E
customer_id
No comments yet.