Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Given a scenario where you need to perform feature engineering on a dataset that is distributed across multiple nodes, describe how you would approach this task using Spark ML. What specific features of Spark ML would you utilize and why?
A
Use Spark ML's DataFrame API for feature extraction due to its simplicity.
B
Leverage Spark ML's feature transformation functions like Bucketizer and VectorAssembler for efficient distributed processing.
C
Implement custom Python scripts for feature engineering to ensure flexibility.
D
Rely on manual data aggregation and then apply scikit-learn for feature engineering.