
Answer-first summary for fast verification
Answer: The data scientist neglected to exponentiate the predictions of the second model to revert them to the original scale before RMSE calculation.
When a model uses 'log(price)' as the label variable, its predictions are on a logarithmic scale. To accurately compare these predictions with actual prices using RMSE, the predictions must be exponentiated (the inverse of a logarithm) to return them to the original price scale. Failing to do so can lead to a misleadingly high RMSE for the second model. Options A and E are incorrect because the difference in RMSE does not directly indicate which model is more accurate due to the scale difference in label variables. Option B is irrelevant since the first model's predictions are already on the correct scale. Option D is false as RMSE is a valid metric for regression, though its interpretation requires careful consideration of the label variable's scale.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data scientist developed two regression models: one using 'price' as the label variable and another using 'log(price)'. Upon evaluating both models using RMSE by comparing predicted values to actual prices, the RMSE for the second model is significantly higher than the first. What is the most valid explanation for this discrepancy? Choose the single best answer.
A
The second model is significantly more accurate than the first.
B
The data scientist did not apply the log transformation to the predictions of the first model before calculating RMSE.
C
The data scientist neglected to exponentiate the predictions of the second model to revert them to the original scale before RMSE calculation.
D
RMSE is not a suitable metric for evaluating regression models.
E
The first model is considerably more accurate than the second.
No comments yet.