Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

The data science team has created and logged a production model using MLflow. The model accepts a list of column names and returns a new column of type DOUBLE.

The following code correctly imports the production model, loads the customers table containing the customer_id key column into a DataFrame, and defines the feature columns needed for the model.

model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]

model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]

Which code block will output a DataFrame with the schema "customer_id LONG, predictions DOUBLE"?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:03

df.map(lambda x: model(x[columns])).select("customer_id, predictions")

df.select("customer_id", model(*columns).alias("predictions"))

model.predict(df, columns)

df.select("customer_id", pandas_udf(model, columns).alias("predictions"))

df.apply(model, columns).select("customer_id, predictions")

Explanation:

Explanation

Correct Answer: B

Why Option B is correct:

model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod") creates a Spark UDF from the MLflow model
The UDF can be used directly in DataFrame operations
model(*columns) applies the UDF to the specified columns using the unpacking operator *
.alias("predictions") renames the resulting column to "predictions"
df.select("customer_id", model(*columns).alias("predictions")) selects both the customer_id column and the predictions column, resulting in the desired schema

Why other options are incorrect:

Option A: Uses df.map() which is not the correct way to apply UDFs in Spark DataFrames. The map() function is for RDD operations, not DataFrame operations.
Option C: model.predict(df, columns) is not a valid method for MLflow Spark UDFs. MLflow Spark UDFs are callable functions, not objects with a .predict() method.
Option D: Uses pandas_udf() which is unnecessary since mlflow.pyfunc.spark_udf() already creates a Spark UDF. This would create redundant UDF wrapping.
Option E: df.apply() is not a valid DataFrame method in PySpark. The correct method for applying functions row-wise is withColumn() or using UDFs in select() statements.

Key Concepts:

MLflow's pyfunc.spark_udf() creates a Spark User-Defined Function (UDF) that can be used in DataFrame operations
Spark UDFs can be applied to columns using the unpacking operator * when passing multiple columns
The select() method is used to choose specific columns from a DataFrame
The alias() method renames columns in the resulting DataFrame

Powered ByGPT-5.2

Comments

Loading comments...