
Answer-first summary for fast verification
Answer: Pandas API on Spark
The Pandas API on Spark is the optimal choice because it leverages distributed memory and processing across a cluster, enabling big data scalability without extensive code modifications. It maintains much of the familiar pandas syntax, facilitating a smoother transition.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data scientist has developed a feature engineering notebook using the pandas library. However, as the data volume increases, the notebook's runtime escalates and processing speed drops in proportion to the data size. Which tool should the data scientist consider to efficiently scale their notebook for big data with minimal refactoring?
A
Feature Store
B
Spark SQL
C
Pandas API on Spark
D
PySpark DataFrame API
E
Scala Dataset API