
Ultimate access to all questions.
You are tasked with predicting a customer's likelihood of purchasing your company's products using a linear regression model on BigQuery ML. The dataset includes categorical variables such as city names, which need to be transformed into a numerical format suitable for the model. The solution must minimize coding effort, preserve all relevant variables, and ensure the model's performance is not adversely affected by the transformation. Given these constraints, which of the following methods is the most efficient and effective for structuring the data? Choose the best option.
A
Use Cloud Data Fusion to numerically label each city (e.g., 1, 2, 3) based on a predefined region categorization, and input these numerical labels into your model.
B
Implement TensorFlow to create a categorical variable with a vocabulary list, then integrate this vocabulary file into your BigQuery ML model.
C
Create a new BigQuery view that removes the city column entirely to simplify the dataset.
D
Apply Google Cloud Dataprep to perform one-hot encoding on the city column, transforming each city into a separate binary column.
E
Both A and D provide viable solutions for transforming categorical city data into a numerical format suitable for linear regression models in BigQuery ML.