
Explanation:
Iterator UDFs are preferred for large datasets because they allow processing data in chunks, which can help manage memory usage and improve performance. This is particularly important in big data scenarios where the entire dataset might not fit into memory.
Ultimate access to all questions.
No comments yet.
In a big data environment, you are tasked with implementing a UDF that processes a very large dataset. Which type of Pandas UDF would you prefer and why?
A
Scalar UDF because they are simpler to implement.
B
Iterator UDF because they can handle large datasets more efficiently by processing data in chunks.
C
Grouped Map UDF because they are more intuitive.
D
Grouped Aggregate UDF because they provide better performance.