Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Consider a scenario where you need to apply a machine learning model to a Spark DataFrame in parallel. Which type of Pandas UDF would you use to achieve this and why?
A
Scalar Pandas UDF because it applies the model to each row individually.
B
Grouped Map Pandas UDF because it allows for group-wise model application.
C
Iterator Pandas UDF because it processes data in chunks, suitable for large datasets.
D
Grouped Aggregate Pandas UDF because it aggregates data before model application.