
Explanation:
The correct answer is happiness_tier because it is a categorical variable, making the mode (most common value) the appropriate choice for imputation. Numeric features such as spend and units are typically imputed using the mean or median. Imputing customer_id is not logical since it serves as a unique identifier for each customer.
Ultimate access to all questions.
A data scientist is working with a feature set that includes the following schema: customer_id STRING, spend DOUBLE, units INTEGER, happiness_tier STRING. The customer_id column serves as the primary key. Each column in the feature set contains some missing values. The scientist plans to replace these missing values by imputing a common value for each feature. Which columns from the feature set are best suited for imputation using the most common value (mode) of the column?
A
customer_id and happiness_tier
B
spend
C
units
D
happiness_tier
E
customer_id
No comments yet.