
Ultimate access to all questions.
You are a data engineer working on a project that involves analyzing customer feedback for a retail company. The dataset contains missing values in the 'customer satisfaction score' column, which is critical for your analysis. The project has tight deadlines and budget constraints, and the dataset is large. You need to ensure the integrity of your analysis while adhering to the project constraints. Considering these factors, which of the following approaches is the BEST to handle the missing values in the 'customer satisfaction score' column? (Choose one option)
A
Replace all missing values with the median satisfaction score of the dataset, as it is a quick and cost-effective method that does not require additional data processing.
B
Analyze the pattern of missing values to determine if they are missing at random. If they are, use multiple imputation to estimate the missing values, ensuring the analysis reflects the uncertainty of the imputed values.
C
Exclude all records with missing satisfaction scores from the analysis to ensure only complete data is used, despite the potential reduction in dataset size.
D
Assign a default satisfaction score to all missing values, such as the average score, to maintain the dataset size and meet project deadlines.