LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


The data science team has deployed a production model in MLflow that takes a list of column names as input and outputs a new column of type DOUBLE.

Given the following code correctly imports the production model, loads the customers table (containing the customer_id key column) into a DataFrame, and defines the required feature columns:

model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]

Which code block will produce a DataFrame with the schema customer_id LONG, predictions DOUBLE?

Exam-Like



Explanation:

The correct answer is B. The model is registered as a Spark UDF, which can be applied to DataFrame columns using select(). model(*columns) unpacks the list of column names as arguments to the UDF, and alias("predictions") names the output column. Selecting customer_id along with the UDF result produces the required schema. Options A, C, and D use incorrect methods (map, predict, apply) which are not valid for Spark UDFs in this context.

Powered ByGPT-5