
Answer-first summary for fast verification
Answer: The input type of customerSatisfaction is not specified in the udf() operation.
### Why Option A is the correct choice: In the code block: `val assessPerformanceUDF = udf((customerSatisfaction) => { ... })` The variable `customerSatisfaction` is missing a **type declaration**. In Scala, when you define an anonymous function inside the `udf()` wrapper, Spark cannot always infer the input data type from the DataFrame schema during compilation. It should be written as: `udf((customerSatisfaction: Int) => { ... })` ### Why other options are incorrect: * **B:** Spark can often infer the return type in Scala, so this is rarely a "hard" error. * **C & E:** These are fundamentally wrong; `withColumn` is the standard way to apply UDFs in the Dataframe API. * **D:** You can define UDFs inline (as shown); you don't *have* to define a separate function first. **Summary for the Exam:** When you see a Scala UDF definition, check if the input parameter (inside the logic) has a type (like `: Int` or `: String`). If it's missing, that is usually the "error" the exam wants you to find.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
The code block below contains an error. It is intended to create a Scala UDF assessPerformanceUDF() and apply it to the integer column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
val assessPerformanceUDF = udf((customerSatisfaction) => {
customerSatisfaction match {
case x if x < 20 => 1
case x if x < 20 => 1 // ← duplicate / suspicious line
case x if x > 80 => 3
case _ => 2
}
})
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
val assessPerformanceUDF = udf((customerSatisfaction) => {
customerSatisfaction match {
case x if x < 20 => 1
case x if x < 20 => 1 // ← duplicate / suspicious line
case x if x > 80 => 3
case _ => 2
}
})
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
A
The input type of customerSatisfaction is not specified in the udf() operation.
B
The return type of assessPerformanceUDF() must be specified.
C
The withColumn() operation is not appropriate here - UDFs should be applied by iterating over rows instead.
D
The assessPerformanceUDF() must first be defined as a Scala function and then converted to a UDF.
E
UDFs can only be applied via SQL and not through the Data Frame API.