Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

In a dataset with a numerical feature 'Temperature', you have noticed that some values are missing. You have decided to use k-Nearest Neighbors (k-NN) imputation to fill in the missing values. Explain the process of k-NN imputation and discuss the potential benefits and limitations of this approach.

Simulated

k-NN imputation involves finding the k-nearest neighbors of each missing value based on other features in the dataset and imputing the missing value with the mean of the neighbors' values. This approach can capture the local relationships between 'Temperature' and other features, but may be computationally expensive for large datasets.

55.6%

Comments

Loading comments...

k-NN imputation involves finding the k-nearest neighbors of each missing value based on the 'Temperature' feature itself and imputing the missing value with the mean of the neighbors' values. This approach can capture global relationships between 'Temperature' values, but may not work well for datasets with a large number of missing values.

k-NN imputation involves finding the k-nearest neighbors of each missing value based on a randomly selected set of features and imputing the missing value with the mean of the neighbors' values. This approach can provide a simple imputation method, but may not capture meaningful relationships between features.

11.1%

k-NN imputation is not suitable for numerical features like 'Temperature', as it is designed for categorical features only.

8.6%