Databricks Certified Associate Developer for Apache Spark

Databricks Certified Associate Developer for Apache Spark

Get started today

Ultimate access to all questions.


The code block below contains an error. It is intended to create a Scala UDF assessPerformanceUDF() and apply it to the integer column customerSatisfaction in DataFrame storesDF. Identify the error.

Code block:

val assessPerformanceUDF = udf((customerSatisfaction: Int) => 
  customerSatisfaction match {
    case x if x < 20 => 1
    case x if x > 80 => 3
    case _ => 2
  }
)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))





Explanation:

The error in the code block is due to the UDF's input parameter type not being specified. In Scala, when defining a UDF, the input parameter types must be explicitly declared to ensure Spark can correctly handle the data types. The UDF in the code uses customerSatisfaction without specifying its type (e.g., Int), leading to potential type inference issues. While the code has syntax errors (e.g., -> instead of => in case statements), the primary UDF-related error aligns with option A. Other options like B (return type not specified) and D (UDF must be defined as a Scala function first) are incorrect because Spark can infer return types for simple cases, and inline UDF definitions are valid.