
Google Professional Machine Learning Engineer
Get started today
Ultimate access to all questions.
You are tasked with predicting a customer's likelihood of purchasing your company's products using a linear regression model on BigQuery ML. The dataset includes categorical variables such as city names, which need to be transformed into a numerical format suitable for the model. The solution must minimize coding effort, preserve all relevant variables, and ensure the model's performance is not adversely affected by the transformation. Given these constraints, which of the following methods is the most efficient and effective for structuring the data? Choose the best option.
You are tasked with predicting a customer's likelihood of purchasing your company's products using a linear regression model on BigQuery ML. The dataset includes categorical variables such as city names, which need to be transformed into a numerical format suitable for the model. The solution must minimize coding effort, preserve all relevant variables, and ensure the model's performance is not adversely affected by the transformation. Given these constraints, which of the following methods is the most efficient and effective for structuring the data? Choose the best option.
Explanation:
Option D is the most efficient and effective method because one-hot encoding transforms categorical variables into a numerical format without introducing an arbitrary order, which could mislead the model. This approach preserves all information about the customer's city and can be accomplished with minimal coding using Dataprep's visual interface. Option A, while it does convert cities into numerical values, imposes an arbitrary order that could distort the model's predictions. Option B introduces unnecessary complexity for this task, and Option C results in the loss of potentially valuable predictive information. Option E is incorrect because while both A and D transform the data, only D does so in a manner that fully preserves the data's integrity and is suitable for linear regression.