Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

In a dataset with a numerical feature 'House Size', you have noticed that some values are missing. You have decided to use forward fill or backward fill methods to fill in the missing values. Explain the process of forward fill and backward fill methods and discuss the potential benefits and limitations of each approach.

Simulated

Forward fill method involves filling in the missing values with the last observed value in the dataset, while backward fill method involves filling in the missing values with the first observed value in the dataset. These approaches can be useful for time series data or when there is a clear trend in the data, but may not work well for datasets with random missingness or when the missing values are not related to the observed values.

36.4%

Forward fill method involves filling in the missing values with the next observed value in the dataset, while backward fill method involves filling in the missing values with the previous observed value in the dataset. These approaches can capture the local relationships between 'House Size' values, but may introduce bias if the missing values are not related to the observed values.

Comments

Loading comments...

Forward fill method involves filling in the missing values with a fixed value, such as the mean or median of 'House Size', while backward fill method involves filling in the missing values with a random value. These approaches can provide simple imputation methods, but may not capture the relationships between 'House Size' and other features.

1.8%