
Ultimate access to all questions.
A machine learning professional is designing a linear regression model using Spark ML to predict car prices. They have a Spark DataFrame (train_df) for training the model, which includes the schema: car_id STRING, price DOUBLE, stars DOUBLE, year_updated DOUBLE, seats DOUBLE. The professional uses the following code block:
lr = LinearRegression(
featuresCol = [“stars”, “year_updated”, “seats”],
labelCol = “price”
)
lr_model = lr.fit(train_df)
lr = LinearRegression(
featuresCol = [“stars”, “year_updated”, “seats”],
labelCol = “price”
)
lr_model = lr.fit(train_df)
What changes are necessary for the professional to successfully implement their linear regression model?
A
No changes are needed
B
Set the parallelism parameter in the Linear Regression operation to a value greater than 1
C
Include the lr object as a stage in a Pipeline to fit the model
D
Combine the stars, year_updated, and seats columns into a single vector column_
E
Call the transform method from the lr_model object on train_df