
Answer-first summary for fast verification
Answer: Stratified Sampling
**Stratified Sampling** is the correct answer. This technique is crucial for efficiently sampling large-scale datasets while ensuring the sample reflects the class distribution of the entire dataset. It's especially beneficial for imbalanced datasets, as it maintains proportional class representation in the training sample. While other techniques like Data Imputation, Outlier Detection, and Feature Scaling play significant roles in data preprocessing, Stratified Sampling directly addresses the need for representative sampling, which is essential for developing unbiased and accurate machine learning models.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In the context of a machine learning project with large-scale datasets, which Databricks MLlib supported technique is most effective for sampling and processing data efficiently for model training?
A
Feature Scaling
B
Stratified Sampling
C
Outlier Detection
D
Data Imputation
No comments yet.