Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Discuss the importance of data preprocessing in distributed machine learning and how Spark ML facilitates this process. Provide an example of a common preprocessing step and how it can be implemented in Spark ML.
A
Data preprocessing is unnecessary in distributed machine learning; Spark ML focuses on model training.
B
Data preprocessing is crucial for ensuring data quality and model performance; Spark ML provides built-in functions like StringIndexer for categorical data handling.
C
Data preprocessing is only needed for small datasets; Spark ML uses direct data ingestion for large datasets.
D
Data preprocessing is handled by external tools; Spark ML integrates with these tools for preprocessing.