
Answer-first summary for fast verification
Answer: The wrong SQL function is used to compute column result - it should be ASSESS_PERFORMANCE instead of assessPerformance.
The error in the code block is due to the mismatch between the registered UDF name "ASSESS_PERFORMANCE" and the name used in the SQL query "assessPerformance". The correct name to use in the SQL query should match exactly with the registered name, including case sensitivity. Option D correctly identifies this issue. The other options are incorrect because: (A) There is no restriction on calling the customerSatisfaction column twice; (B) Registered UDFs can indeed be applied inside SQL statements; (C) The order of arguments in spark.udf.register() is correct; and (E) The sql() operation is valid for applying UDFs, and the DataFrame API is not the only way to apply them.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
The following code block contains an error. It is intended to create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Scala function assessPerformance() and apply it to the column customerSatisfaction in the table stores. Identify the error.
Code block:
spark.udf.register("ASSESS_PERFORMANCE", assessPerformance)
spark.sql("SELECT customerSatisfaction, ASSESS_PERFORMANCE(customerSatisfaction) AS result FROM stores")
spark.udf.register("ASSESS_PERFORMANCE", assessPerformance)
spark.sql("SELECT customerSatisfaction, ASSESS_PERFORMANCE(customerSatisfaction) AS result FROM stores")
A
The customerSatisfaction column cannot be called twice inside the SQL statement.
B
Registered UDFs cannot be applied inside of a SQL statement.
C
The order of the arguments to spark.udf.register() should be reversed.
D
The wrong SQL function is used to compute column result - it should be ASSESS_PERFORMANCE instead of assessPerformance.
E
There is no sql() operation - the DataFrame API must be used to apply the UDF assessPerformance().