Reddit

The data science team has deployed a production model in MLflow that takes a list of column names as input and outputs a new column of type DOUBLE.

Given the following code correctly imports the production model, loads the customers table (containing the customer_id key column) into a DataFrame, and defines the required feature columns:

model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]

model = mlflow.pyfunc.spark_udf(spark, model_uri="models:/churn/prod")
df = spark.table("customers")
columns = ["account_age", "time_since_last_seen", "app_rating"]

Which code block will produce a DataFrame with the schema customer_id LONG, predictions DOUBLE?_

Exam-Like

df.map(lambda x:model(x[columns])).select("customer_id, predictions")_

18.4%

df.select("customer_id", model(columns).alias("predictions"))_

53.1%

model.predict(df, columns)

12.9%

df.apply(model, columns).select("customer_id, predictions")_

15.6%

Databricks Certified Data Engineer - Professional

Get started today

Comments