
Explanation:
In the code block:
val assessPerformanceUDF = udf((customerSatisfaction) => { ... })
The variable customerSatisfaction is missing a type declaration. In Scala, when you define an anonymous function inside the udf() wrapper, Spark cannot always infer the input data type from the DataFrame schema during compilation. It should be written as:
udf((customerSatisfaction: Int) => { ... })
withColumn is the standard way to apply UDFs in the Dataframe API.Summary for the Exam: When you see a Scala UDF definition, check if the input parameter (inside the logic) has a type (like : Int or : String). If it's missing, that is usually the "error" the exam wants you to find.
Ultimate access to all questions.
The code block below contains an error. It is intended to create a Scala UDF assessPerformanceUDF() and apply it to the integer column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
val assessPerformanceUDF = udf((customerSatisfaction) => {
customerSatisfaction match {
case x if x < 20 => 1
case x if x < 20 => 1 // ← duplicate / suspicious line
case x if x > 80 => 3
case _ => 2
}
})
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
val assessPerformanceUDF = udf((customerSatisfaction) => {
customerSatisfaction match {
case x if x < 20 => 1
case x if x < 20 => 1 // ← duplicate / suspicious line
case x if x > 80 => 3
case _ => 2
}
})
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
A
The input type of customerSatisfaction is not specified in the udf() operation.
B
The return type of assessPerformanceUDF() must be specified.
C
The withColumn() operation is not appropriate here - UDFs should be applied by iterating over rows instead.
D
The assessPerformanceUDF() must first be defined as a Scala function and then converted to a UDF.
E
UDFs can only be applied via SQL and not through the Data Frame API.