
Explanation:
The correct code block to compute the accuracy of the two-class decision tree classifier according to the data in preds_df and assign it to the accuracy variable is option E. This is because BinaryClassificationEvaluator is suitable for binary classification problems. The correct parameter for representing the raw prediction column is rawPredictionCol, not predictionCol. In binary classification, raw predictions are often used to calculate metrics. labelCol specifies the name of the column containing the true labels or actual values. metricName is set to "accuracy" to calculate accuracy as the evaluation metric.
Ultimate access to all questions.
No comments yet.
A data scientist has developed a two-class decision tree classifier using Spark ML and computed the predictions in a Spark DataFrame preds_df with the schema: prediction DOUBLE, actual DOUBLE. Which of the following code blocks correctly computes the model's accuracy from preds_df and assigns it to the accuracy variable?
A
accuracy = RegressionEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
B
accuracy = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy") accuracy = classification_evaluator.evaluate(preds_df)
C
classification_evaluator = BinaryClassificationEvaluator(predictionCol="prediction", labelCol="actual", metricName="accuracy")
D
accuracy = Summarizer(predictionCol="prediction", labelCol="actual", metricName="accuracy")
E
classification_evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="actual", metricName="accuracy") accuracy = classification_evaluator.evaluate(preds_df)