
Answer-first summary for fast verification
Answer: Use SQL in BigQuery to transform the state column using a one-hot encoding method, and make each city a column with binary values.
The correct answer is B. One-hot encoding is a common technique used to handle categorical data in machine learning. This approach will transform the city name variable into a series of binary columns, one for each city. Each row will have a '1' in the column corresponding to the city it represents and '0' in all other city columns. This method is effective for linear regression models as it enables the model to use city data as a series of numeric, binary variables. BigQuery supports SQL operations that can easily implement one-hot encoding, thus minimizing the amount of coding required and efficiently preparing the data for the model. Option D, although requiring the least amount of coding, would lose granularity and can misrepresent the model as it groups cities into numeric regions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with developing a linear regression model using BigQuery ML with the goal of predicting the likelihood that a customer will purchase your company's products. One of the key predictive features in your model is the city name of the customer. For successful training and deployment of this model, your data should be structured in a columnar format. Your objective is to prepare the data with minimal coding effort while maintaining the integrity of the predictive variables. What approach should you take?
A
Create a new view with BigQuery that does not include a column with city information.
B
Use SQL in BigQuery to transform the state column using a one-hot encoding method, and make each city a column with binary values.
C
Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file and upload that as part of your model to BigQuery ML.
D
Use Cloud Data Fusion to assign each city to a region that is labeled as 1, 2, 3, 4, or 5, and then use that number to represent the city in the model.
No comments yet.