
Ultimate access to all questions.
As a junior Data Scientist, you developed a linear regression model using sklearn that showed a high R-square value, indicating a good fit based on the coefficient of determination. However, upon deployment, the model's predictions were significantly off. Your mentor attributed this to the Anscombe Quartet problem, which illustrates how datasets with similar statistical properties can have vastly different distributions. Beyond the Anscombe Quartet, what other critical issues in data analysis and model evaluation does this scenario highlight? Choose the two most relevant options.
A
The presence of outliers that can disproportionately influence the model's predictions
B
The existence of non-linear relationships between the independent and dependent variables that a linear model cannot capture
C
High multicollinearity among predictor variables, leading to unreliable coefficient estimates
D
Data entry errors that introduce noise and bias into the dataset
E
All of the above