
Answer-first summary for fast verification
Answer: Scalar Pandas UDF because it applies the model to each row individually.
A Scalar Pandas UDF is the appropriate choice for applying a machine learning model to each row of a Spark DataFrame in parallel. This type of UDF is designed to operate on individual rows, making it ideal for row-wise model application tasks.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Consider a scenario where you need to apply a machine learning model to a Spark DataFrame in parallel. Which type of Pandas UDF would you use to achieve this and why?
A
Scalar Pandas UDF because it applies the model to each row individually.
B
Grouped Map Pandas UDF because it allows for group-wise model application.
C
Iterator Pandas UDF because it processes data in chunks, suitable for large datasets.
D
Grouped Aggregate Pandas UDF because it aggregates data before model application.
No comments yet.