
Explanation:
The most efficient method to prepare the data for a linear regression model in BigQuery ML, with city names as a key predictive component, is to use SQL in BigQuery to apply one-hot encoding to the state column and convert each city to a binary value column. This approach, known as one-hot encoding, is a standard method for handling categorical variables in linear regression models. It creates dummy variables for each city, enabling the model to use city information as a predictive variable. This method is more straightforward and requires less coding compared to other options, making it the preferred choice for this scenario.
Ultimate access to all questions.
No comments yet.
When developing a linear regression model in BigQuery ML to predict a customer's likelihood of purchasing your company's products, city names are a key predictive component. However, the data must be organized into columns for both training and serving the model. What is the most efficient method to prepare this data?
A
Use Cloud Data Fusion to assign a number to each city based on its region and represent it with that number in the model
B
Create a new view in BigQuery that excludes the city column
C
Use SQL in BigQuery to apply one-hot encoding to the state column and convert each city to a binary value column
D
Use TensorFlow to generate a categorical variable with a vocabulary list and a vocabulary file that can be uploaded to BigQuery ML