
Answer-first summary for fast verification
Answer: `classification_evaluator = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")`, `accuracy = classification_evaluator.evaluate(preds_df)`
The correct approach involves two steps: first, creating an instance of `MulticlassClassificationEvaluator` with the appropriate parameters (`predictionCol`, `labelCol`, and `metricName` set to "accuracy"), and then using the `.evaluate` method on the DataFrame (`preds_dt`, though the options mistakenly reference `preds_df`) to compute the accuracy. This method calculates the proportion of correctly predicted instances, assigning the result to `accuracy`. Options A and B are incomplete or incorrect because they either miss the evaluation step or use the wrong evaluator (`RegressionEvaluator` is for regression tasks, not classification). Option E is incorrect as `Summarizer` is not relevant for calculating prediction accuracy in this context.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data scientist has developed a three-class decision tree classifier using Spark ML and stored the predictions in a Spark DataFrame named preds_dt. The DataFrame has the schema: prediction DOUBLE, actual DOUBLE. Which code segment correctly calculates the model's accuracy from preds_dt and assigns it to the accuracy variable? Choose the best answer.
A
accuracy = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
B
accuracy = RegressionEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
C
classification_evaluator = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
D
accuracy = classification_evaluator.evaluate(preds_df)
E
accuracy = Summarizer(predictionCol="prediction", labelCol="actual", metricName="accuracy")
F
None
No comments yet.