Ultimate access to all questions.
A data analyst has built an ML pipeline using a fixed input dataset with Spark ML, but the pipeline's processing time is too high. To enhance efficiency, the analyst increased the number of workers in the cluster. However, they noticed a difference in the training set's row count after the cluster reconfiguration compared to before. Which strategy guarantees a consistent training and test set for each model iteration?