Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

In the context of Spark ML, explain the process of splitting data using Spark ML and identify the key gotchas that one might encounter during this process. Provide a code snippet demonstrating the correct way to split data and explain how to handle the potential issues that may arise.

Simulated

Use the train_test_split function from the sklearn.model_selection module to split the data.

11.6%

Use the randomSplit method from the Spark DataFrame API to split the data, ensuring that the seed is set for reproducibility.

Comments

Loading comments...

Use the split method from the Spark DataFrame API to split the data, but be aware of the potential for data leakage.

16.3%

Use the randomSplitWithWeights method from the Spark DataFrame API to split the data, taking into account the class weights for imbalanced datasets.

23.3%