Ultimate access to all questions.
A data scientist has developed a linear regression model using log(price) as the target variable. After making predictions, both the predicted and actual values are stored in a Spark DataFrame named preds_df. The data scientist evaluates the model using the following code: regression_evaluator.setMetricName("rmse").evaluate(preds_df)
. What modification is necessary to the RMSE evaluation approach to ensure it's comparable with the original price scale? Choose the single best answer.
Explanation:
The correct answer is C. Since the model predicts log(price), the predictions are on a log scale. To compare these predictions directly with the original price values, we must first convert the predictions back to the original scale using the exponentiation function (e.g., np.exp
in Python). Only then should we calculate the RMSE to assess the model's accuracy in terms of actual prices. This approach ensures that the RMSE reflects the model's performance in the context of the original price scale, making it meaningful for practical interpretation.