
Explanation:
The correct approach involves two steps: first, creating an instance of MulticlassClassificationEvaluator with the appropriate parameters (predictionCol, labelCol, and metricName set to "accuracy"), and then using the .evaluate method on the DataFrame (preds_dt, though the options mistakenly reference preds_df) to compute the accuracy. This method calculates the proportion of correctly predicted instances, assigning the result to accuracy. Options A and B are incomplete or incorrect because they either miss the evaluation step or use the wrong evaluator (RegressionEvaluator is for regression tasks, not classification). Option E is incorrect as Summarizer is not relevant for calculating prediction accuracy in this context.
Ultimate access to all questions.
No comments yet.
A data scientist has developed a three-class decision tree classifier using Spark ML and stored the predictions in a Spark DataFrame named preds_dt. The DataFrame has the schema: prediction DOUBLE, actual DOUBLE. Which code segment correctly calculates the model's accuracy from preds_dt and assigns it to the accuracy variable? Choose the best answer.
A
accuracy = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
B
accuracy = RegressionEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
C
classification_evaluator = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
D
accuracy = classification_evaluator.evaluate(preds_df)
E
accuracy = Summarizer(predictionCol="prediction", labelCol="actual", metricName="accuracy")
F
None