
Answer-first summary for fast verification
Answer: Use SQL in BigQuery to apply one-hot encoding to the state column and convert each city to a binary value column
The most efficient method to prepare the data for a linear regression model in BigQuery ML, with city names as a key predictive component, is to use SQL in BigQuery to apply one-hot encoding to the state column and convert each city to a binary value column. This approach, known as one-hot encoding, is a standard method for handling categorical variables in linear regression models. It creates dummy variables for each city, enabling the model to use city information as a predictive variable. This method is more straightforward and requires less coding compared to other options, making it the preferred choice for this scenario.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
When developing a linear regression model in BigQuery ML to predict a customer's likelihood of purchasing your company's products, city names are a key predictive component. However, the data must be organized into columns for both training and serving the model. What is the most efficient method to prepare this data?
A
Use Cloud Data Fusion to assign a number to each city based on its region and represent it with that number in the model
B
Create a new view in BigQuery that excludes the city column
C
Use SQL in BigQuery to apply one-hot encoding to the state column and convert each city to a binary value column
D
Use TensorFlow to generate a categorical variable with a vocabulary list and a vocabulary file that can be uploaded to BigQuery ML