
Answer-first summary for fast verification
Answer: df.select("customer_id", model(*columns).alias("predictions"))
The correct answer is B. The model is registered as a Spark UDF, which can be applied to DataFrame columns using `select()`. `model(*columns)` unpacks the list of column names as arguments to the UDF, and `alias("predictions")` names the output column. Selecting `customer_id` along with the UDF result produces the required schema. Options A, C, and D use incorrect methods (`map`, `predict`, `apply`) which are not valid for Spark UDFs in this context.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
The data science team has deployed a production model in MLflow that takes a list of column names as input and outputs a new column of type DOUBLE.
Given the following code correctly imports the production model, loads the customers table (containing the customer_id key column) into a DataFrame, and defines the required feature columns:
model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]
model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]
Which code block will produce a DataFrame with the schema customer_id LONG, predictions DOUBLE?
A
df.map(lambda x:model(x[columns])).select("customer_id, predictions")
B
df.select("customer_id", model(*columns).alias("predictions"))
C
model.predict(df, columns)
D
df.apply(model, columns).select("customer_id, predictions")
No comments yet.