Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When a data scientist integrates a random forest regressor pipeline as the final stage in a Spark ML Pipeline and initiates cross-validation, what is a potential downside of constructing the pipeline within the cross-validation process?
A
The process might not be able to parallelize tuning because of the pipeline's distributed nature.
B
There's a risk of leaking data preparation details from validation sets to training sets for each model.
C
The runtime could significantly increase as every pipeline stage requires refitting or retransforming for each model iteration.
D
It may fail to evaluate all unique hyperparameter value combinations in the parameter grid.
E
Data leakage from the training set to the test set could occur during evaluation.