
Answer-first summary for fast verification
Answer: Combine the `stars`, `year_updated`, and `seats` columns into a single vector column
Spark ML requires all input features to be in a single vector column, typically named 'features'. The professional must transform the `stars`, `year_updated`, and `seats` columns into one vector column using a tool like `VectorAssembler`. This step is crucial because Spark ML does not accept multiple separate feature columns as input.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A machine learning professional is designing a linear regression model using Spark ML to predict car prices. They have a Spark DataFrame (train_df) for training the model, which includes the schema: car_id STRING, price DOUBLE, stars DOUBLE, year_updated DOUBLE, seats DOUBLE. The professional uses the following code block:
lr = LinearRegression(
featuresCol = [“stars”, “year_updated”, “seats”],
labelCol = “price”
)
lr_model = lr.fit(train_df)
lr = LinearRegression(
featuresCol = [“stars”, “year_updated”, “seats”],
labelCol = “price”
)
lr_model = lr.fit(train_df)
What changes are necessary for the professional to successfully implement their linear regression model?
A
No changes are needed
B
Set the parallelism parameter in the Linear Regression operation to a value greater than 1
C
Include the lr object as a stage in a Pipeline to fit the model
D
Combine the stars, year_updated, and seats columns into a single vector column
E
Call the transform method from the lr_model object on train_df
No comments yet.