Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

You are using BigQuery ML to train a linear regression model on a table that receives new data every minute. The model is retrained hourly using Cloud Scheduler and Vertex AI Pipelines. Feature preprocessing involves quantile bucketization and MinMax scaling on the last hour of data. To minimize storage and computational overhead, what is your recommended approach?

Exam-Like

Preprocess and stage the data in BigQuery prior to feeding it to the model during training and inference.

0.0%

Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.

Comments

Loading comments...

Create a component in the Vertex AI Pipelines directed acyclic graph (DAG) to calculate the required statistics, and pass the statistics on to subsequent components.

Create SQL queries to calculate and store the required statistics in separate BigQuery tables that are referenced in the CREATE MODEL statement.