
Answer-first summary for fast verification
Answer: Apply Google Cloud Dataprep to perform one-hot encoding on the city column, transforming each city into a separate binary column., Use Cloud Data Fusion to numerically label each city (e.g., 1, 2, 3) based on a predefined region categorization, and input these numerical labels into your model.
Option D is the most efficient and effective method because one-hot encoding transforms categorical variables into a numerical format without introducing an arbitrary order, which could mislead the model. This approach preserves all information about the customer's city and can be accomplished with minimal coding using Dataprep's visual interface. Option A, while it does convert cities into numerical values, imposes an arbitrary order that could distort the model's predictions. Option B introduces unnecessary complexity for this task, and Option C results in the loss of potentially valuable predictive information. Option E is incorrect because while both A and D transform the data, only D does so in a manner that fully preserves the data's integrity and is suitable for linear regression.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with predicting a customer's likelihood of purchasing your company's products using a linear regression model on BigQuery ML. The dataset includes categorical variables such as city names, which need to be transformed into a numerical format suitable for the model. The solution must minimize coding effort, preserve all relevant variables, and ensure the model's performance is not adversely affected by the transformation. Given these constraints, which of the following methods is the most efficient and effective for structuring the data? Choose the best option.
A
Use Cloud Data Fusion to numerically label each city (e.g., 1, 2, 3) based on a predefined region categorization, and input these numerical labels into your model.
B
Implement TensorFlow to create a categorical variable with a vocabulary list, then integrate this vocabulary file into your BigQuery ML model.
C
Create a new BigQuery view that removes the city column entirely to simplify the dataset.
D
Apply Google Cloud Dataprep to perform one-hot encoding on the city column, transforming each city into a separate binary column.
E
Both A and D provide viable solutions for transforming categorical city data into a numerical format suitable for linear regression models in BigQuery ML.