Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When developing complex data transformation logic in Spark on Databricks using custom User Defined Functions (UDFs), what are the best practices to ensure they are both performant and maintainable?
A
Broadcasting small datasets used within UDFs to optimize their access across the cluster nodes
B
Using Spark SQL's built-in functions wherever possible and reserving UDFs for operations that cannot be expressed with built-in functions
C
Writing UDFs in Python for ease of development, disregarding the performance implications compared to Scala or Java UDFs
D
Encapsulating all transformation logic within a single UDF to minimize the invocation overhead