Reddit

A machine learning professional is designing a linear regression model using Spark ML to predict car prices. They have a Spark DataFrame (train_df) for training the model, which includes the schema: car_id STRING, price DOUBLE, stars DOUBLE, year_updated DOUBLE, seats DOUBLE. The professional uses the following code block:

lr = LinearRegression(
  featuresCol = [“stars”, “year_updated”, “seats”],
  labelCol = “price”
)
lr_model = lr.fit(train_df)

lr = LinearRegression(
  featuresCol = [“stars”, “year_updated”, “seats”],
  labelCol = “price”
)
lr_model = lr.fit(train_df)

What changes are necessary for the professional to successfully implement their linear regression model?

Real Exam

No changes are needed

12.0%

Set the parallelism parameter in the Linear Regression operation to a value greater than 1

12.0%

Include the lr object as a stage in a Pipeline to fit the model

12.0%

Combine the stars, year_updated, and seats columns into a single vector column_

44.0%

Call the transform method from the lr_model object on train_df

20.0%

Databricks Certified Machine Learning - Associate

Get started today

Comments