
Answer-first summary for fast verification
Answer: When performing distributed model inference with libraries not originally designed for distributed computing, such as Scikit-learn, UDFs become necessary.
In Spark, User-Defined Functions (UDFs) enable the integration of custom functions into DataFrame transformation pipelines. While Spark MLLib is optimized for Spark and doesn't require UDFs for distributed model inference, libraries like Scikit-learn, which lack native support for distributed computation, necessitate the use of UDFs to facilitate the distribution of the model inference process.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
When working with Spark, under what circumstances might you need to use a User-Defined Function (UDF) in a DataFrame transformation pipeline?
A
UDFs are essential for any DataFrame transformation in Spark.
B
For distributed model inference with Spark MLLib, UDFs are a necessity.
C
UDFs play a role solely in data preprocessing tasks within Spark.
D
When performing distributed model inference with libraries not originally designed for distributed computing, such as Scikit-learn, UDFs become necessary.