
Ultimate access to all questions.
As a junior Data Scientist at a consulting firm, you're tasked with improving a machine learning model's performance. The initial analysis reveals that the dataset contains numerous missing values (NaN) across various fields, significantly impacting the model's accuracy. Your team lead emphasizes the importance of handling these missing values effectively during the data acquisition phase to ensure the model's reliability and performance. Considering the constraints of maintaining data integrity, minimizing bias, and ensuring scalability, which three strategies should you implement to address the missing values? (Choose three)
A
Implement a secondary machine learning model specifically designed to predict and fill in missing values based on the available data.
B
For numerical fields with missing values, replace the NaN entries with the mean or median of the available data in those fields.
C
Remove all records from the dataset that contain any missing values, regardless of the field or the amount of missing data.
D
For categorical fields, replace missing values with the most frequently occurring category in those fields.
E
Use a random value from the existing data to fill in missing entries, ensuring a uniform distribution of replaced values.