
Databricks Certified Associate Developer for Apache Spark
Get started today
Ultimate access to all questions.
The code block below contains an error. It is intended to create a Scala UDF assessPerformanceUDF()
and apply it to the integer column customerSatisfaction
in DataFrame storesDF
. Identify the error.
Code block:
val assessPerformanceUDF = udf((customerSatisfaction: Int) =>
customerSatisfaction match {
case x if x < 20 => 1
case x if x > 80 => 3
case _ => 2
}
)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
The code block below contains an error. It is intended to create a Scala UDF assessPerformanceUDF()
and apply it to the integer column customerSatisfaction
in DataFrame storesDF
. Identify the error.
Code block:
val assessPerformanceUDF = udf((customerSatisfaction: Int) =>
customerSatisfaction match {
case x if x < 20 => 1
case x if x > 80 => 3
case _ => 2
}
)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
Explanation:
The error in the code block is due to the UDF's input parameter type not being specified. In Scala, when defining a UDF, the input parameter types must be explicitly declared to ensure Spark can correctly handle the data types. The UDF in the code uses customerSatisfaction
without specifying its type (e.g., Int
), leading to potential type inference issues. While the code has syntax errors (e.g., ->
instead of =>
in case
statements), the primary UDF-related error aligns with option A. Other options like B (return type not specified) and D (UDF must be defined as a Scala function first) are incorrect because Spark can infer return types for simple cases, and inline UDF definitions are valid.