Ultimate access to all questions.
The data science team has deployed a production model in MLflow that takes a list of column names as input and outputs a new column of type DOUBLE.
Given the following code correctly imports the production model, loads the customers
table (containing the customer_id
key column) into a DataFrame, and defines the required feature columns:
model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]
Which code block will produce a DataFrame with the schema customer_id LONG, predictions DOUBLE
?
Explanation:
The correct answer is B. The model is registered as a Spark UDF, which can be applied to DataFrame columns using select()
. model(*columns)
unpacks the list of column names as arguments to the UDF, and alias("predictions")
names the output column. Selecting customer_id
along with the UDF result produces the required schema. Options A, C, and D use incorrect methods (map
, predict
, apply
) which are not valid for Spark UDFs in this context.