Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In a Spark MLlib project, you are working with a large dataset and need to build a decision tree model. Which of the following techniques can be used to improve the scalability and performance of the decision tree algorithm in Spark?
A
Use a single decision tree model and increase the maximum depth of the tree to capture more complex patterns.
B
Use a distributed version of the decision tree algorithm, where each node in the cluster builds a separate decision tree.
C
Increase the minimum number of instances required to split a node in the decision tree to reduce overfitting.
D
Use a combination of feature selection techniques and pruning methods to simplify the decision tree and reduce its complexity.