
Answer-first summary for fast verification
Answer: For classification problems.
### Correct Answer: C) For classification problems. **Explanation:** **Stratified Sampling:** This is a sampling technique designed to ensure that the sample maintains the same class proportions as the original dataset. It's particularly important in classification problems where an imbalance between classes can significantly affect the model's performance. **PySpark sampleBy Method:** This is a built-in PySpark function that facilitates stratified sampling based on a specified column. **AutoML's Use of sampleBy:** AutoML specifically uses the `sampleBy` method for classification problems to generate a balanced training set, ensuring that each class is represented adequately. This approach helps in preventing models from overfitting to the majority class and enhances their ability to generalize to unseen data. **Incorrect Options:** - **A) Regression Problems:** Regression typically deals with continuous target variables, where stratified sampling isn't directly applicable. - **B) Forecasting Problems:** Forecasting involves predicting future values based on time-series data, where stratified sampling isn't a primary concern. - **D) Downsampling Major Class:** While AutoML might downsample the major class in imbalanced datasets, it doesn't exclusively use `sampleBy` for this purpose. It may employ other techniques or combine downsampling with other strategies. **Key Points:** - Stratified sampling is crucial for developing robust classification models, especially on imbalanced datasets. - AutoML automatically uses `sampleBy` to ensure a balanced training set for classification tasks. - This step is vital in preventing model bias and enhancing accuracy in classification problems.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.