
Answer-first summary for fast verification
Answer: The existence of non-linear relationships between the independent and dependent variables that a linear model cannot capture, The presence of outliers that can disproportionately influence the model's predictions
The Anscombe Quartet underscores the importance of visualizing data before modeling, as it can reveal non-linear relationships (B) and outliers (A) that are not apparent from summary statistics alone. While options C and D are potential issues in data analysis, they are not directly highlighted by the Anscombe Quartet. Option E is incorrect because not all listed issues are exemplified by the Quartet. This scenario emphasizes the need for comprehensive data exploration beyond relying solely on statistical metrics like R-square.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
As a junior Data Scientist, you developed a linear regression model using sklearn that showed a high R-square value, indicating a good fit based on the coefficient of determination. However, upon deployment, the model's predictions were significantly off. Your mentor attributed this to the Anscombe Quartet problem, which illustrates how datasets with similar statistical properties can have vastly different distributions. Beyond the Anscombe Quartet, what other critical issues in data analysis and model evaluation does this scenario highlight? Choose the two most relevant options.
A
The presence of outliers that can disproportionately influence the model's predictions
B
The existence of non-linear relationships between the independent and dependent variables that a linear model cannot capture
C
High multicollinearity among predictor variables, leading to unreliable coefficient estimates
D
Data entry errors that introduce noise and bias into the dataset
E
All of the above