
Answer-first summary for fast verification
Answer: The runtime could significantly increase as every pipeline stage requires refitting or retransforming for each model iteration.
Incorporating the entire pipeline as an estimator in cross-validation means that each stage, including data preprocessing, is executed for every fold. This leads to a longer runtime because each stage of the pipeline must be refitted or retransformed for every model iteration, ensuring that the process is thorough but time-consuming.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When a data scientist integrates a random forest regressor pipeline as the final stage in a Spark ML Pipeline and initiates cross-validation, what is a potential downside of constructing the pipeline within the cross-validation process?
A
The process might not be able to parallelize tuning because of the pipeline's distributed nature.
B
There's a risk of leaking data preparation details from validation sets to training sets for each model.
C
The runtime could significantly increase as every pipeline stage requires refitting or retransforming for each model iteration.
D
It may fail to evaluate all unique hyperparameter value combinations in the parameter grid.
E
Data leakage from the training set to the test set could occur during evaluation.
No comments yet.