
Answer-first summary for fast verification
Answer: `classification_evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="actual", metricName="accuracy") accuracy = classification_evaluator.evaluate(preds_df)`
The correct code block to compute the accuracy of the two-class decision tree classifier according to the data in `preds_df` and assign it to the `accuracy` variable is option E. This is because `BinaryClassificationEvaluator` is suitable for binary classification problems. The correct parameter for representing the raw prediction column is `rawPredictionCol`, not `predictionCol`. In binary classification, raw predictions are often used to calculate metrics. `labelCol` specifies the name of the column containing the true labels or actual values. `metricName` is set to "accuracy" to calculate accuracy as the evaluation metric.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data scientist has developed a two-class decision tree classifier using Spark ML and computed the predictions in a Spark DataFrame preds_df with the schema: prediction DOUBLE, actual DOUBLE. Which of the following code blocks correctly computes the model's accuracy from preds_df and assigns it to the accuracy variable?
A
accuracy = RegressionEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
B
accuracy = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy") accuracy = classification_evaluator.evaluate(preds_df)
C
classification_evaluator = BinaryClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
D
accuracy = Summarizer(predictionCol="prediction", labelCol="actual", metricName="accuracy")
E
classification_evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="actual", metricName="accuracy") accuracy = classification_evaluator.evaluate(preds_df)