
Explanation:
Explanation:
Stratified Sampling: This is a sampling technique designed to ensure that the sample maintains the same class proportions as the original dataset. It's particularly important in classification problems where an imbalance between classes can significantly affect the model's performance.
PySpark sampleBy Method: This is a built-in PySpark function that facilitates stratified sampling based on a specified column.
AutoML's Use of sampleBy: AutoML specifically uses the sampleBy method for classification problems to generate a balanced training set, ensuring that each class is represented adequately. This approach helps in preventing models from overfitting to the majority class and enhances their ability to generalize to unseen data.
Incorrect Options:
sampleBy for this purpose. It may employ other techniques or combine downsampling with other strategies.Key Points:
sampleBy to ensure a balanced training set for classification tasks.Ultimate access to all questions.
No comments yet.