LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


You are tasked with developing a linear regression model in BigQuery ML to predict the likelihood of a customer purchasing your company's products. The model uses city names as a key predictive factor. What is the most efficient way to structure your data into columns for training and deploying the model with minimal coding effort?

Real Exam



Explanation:

The correct approach is B, which involves using SQL in BigQuery to apply one-hot encoding to the city column. This method efficiently converts categorical city names into binary columns, enabling the linear regression model to utilize city information as a predictive feature without unnecessary complexity.

  • Why not A? Assigning cities to numerical regions may oversimplify the data, potentially diminishing the model's predictive accuracy by not fully capturing each city's unique characteristics.
  • Why not C? Removing city information altogether would eliminate a significant predictive variable, likely reducing the model's effectiveness.
  • Why not D? While TensorFlow offers powerful tools for handling categorical data, this approach introduces unnecessary complexity for the task at hand, as one-hot encoding can be directly accomplished within BigQuery using SQL.
Powered ByGPT-5