Ultimate access to all questions.
Consider a dataset with a numerical feature 'Age' having missing values. Write a code snippet to perform median imputation on this feature using Python and the scikit-learn library. Explain how this process handles missing values.
Explanation:
The correct code snippet to perform median imputation on the 'Age' feature is 'from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='median') imputer.fit(df[['Age']]) df['Age'] = imputer.transform(df[['Age']])'. The 'strategy='median'' parameter ensures that missing values are replaced with the median of the available data, which is less sensitive to outliers and can better preserve the shape of the distribution.