
Explanation:
Spark ML requires all input features to be in a single vector column, typically named 'features'. The professional must transform the stars, year_updated, and seats columns into one vector column using a tool like VectorAssembler. This step is crucial because Spark ML does not accept multiple separate feature columns as input.
Ultimate access to all questions.
No comments yet.
A machine learning professional is designing a linear regression model using Spark ML to predict car prices. They have a Spark DataFrame (train_df) for training the model, which includes the schema: car_id STRING, price DOUBLE, stars DOUBLE, year_updated DOUBLE, seats DOUBLE. The professional uses the following code block:
lr = LinearRegression(
featuresCol = [“stars”, “year_updated”, “seats”],
labelCol = “price”
)
lr_model = lr.fit(train_df)
lr = LinearRegression(
featuresCol = [“stars”, “year_updated”, “seats”],
labelCol = “price”
)
lr_model = lr.fit(train_df)
What changes are necessary for the professional to successfully implement their linear regression model?
A
No changes are needed
B
Set the parallelism parameter in the Linear Regression operation to a value greater than 1
C
Include the lr object as a stage in a Pipeline to fit the model
D
Combine the stars, year_updated, and seats columns into a single vector column
E
Call the transform method from the lr_model object on train_df