
Answer-first summary for fast verification
Answer: Predict the missing values by applying linear regression for an accurate data representation, leveraging the existing data to estimate missing values without introducing significant bias., Use a combination of imputation for missing values and feature engineering to enhance the 'distance from the nearest school' feature's variance, ensuring both data completeness and improved model performance.
Option D is correct because employing linear regression to estimate missing values ensures the data's accuracy without introducing bias, aligning with the need for a cost-effective and scalable solution. Linear regression, a supervised learning method, fits a linear equation to the data, enabling the prediction of unknown values. This technique is particularly useful for maintaining the model's integrity when dealing with low-variance features that are missing data. Option E is also correct as it suggests a comprehensive approach that not only addresses the missing data issue but also enhances the feature's variance, potentially improving the model's predictive performance. However, the primary focus should be on accurately handling the missing data, making D the best single option.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with developing a machine learning model to predict house prices for a real estate company. During the data preprocessing phase, you discover that the 'distance from the nearest school' feature, which is considered a crucial predictor, has a significant number of missing values and exhibits low variance. The company emphasizes the importance of utilizing every data row to maximize the model's predictive accuracy. Additionally, the solution must be cost-effective and scalable to accommodate future data. Given these constraints, what is the optimal strategy to handle the missing data in this scenario? Choose the best option.
A
Replace the missing values with zeros to maintain dataset completeness, despite the potential introduction of bias.
B
Eliminate any rows that contain missing entries to ensure data integrity, at the risk of reducing the dataset size and losing valuable information.
C
Merge this feature with another column that is fully populated to avoid data loss, though this may dilute the predictive power of the original feature.
D
Predict the missing values by applying linear regression for an accurate data representation, leveraging the existing data to estimate missing values without introducing significant bias.
E
Use a combination of imputation for missing values and feature engineering to enhance the 'distance from the nearest school' feature's variance, ensuring both data completeness and improved model performance.