
Answer-first summary for fast verification
Answer: Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.
The question emphasizes minimizing both storage and computational overhead. Option B (using the TRANSFORM clause in CREATE MODEL) is optimal because it integrates preprocessing directly into the model creation process, eliminating the need for separate storage of preprocessed data or statistics. This approach ensures consistency between training and inference while avoiding redundant computations. In contrast, Option A requires staging preprocessed data, increasing storage costs. Option C introduces additional computational overhead through Vertex AI Pipelines components. Option D necessitates storing statistics in separate BigQuery tables, increasing storage overhead. The community discussion strongly supports B, with the highest upvotes and consensus that it best meets the minimization requirements.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are using BigQuery ML to train a linear regression model on a table that receives new data every minute. The model is retrained hourly using Cloud Scheduler and Vertex AI Pipelines. Feature preprocessing involves quantile bucketization and MinMax scaling on the last hour of data. To minimize storage and computational overhead, what is your recommended approach?
A
Preprocess and stage the data in BigQuery prior to feeding it to the model during training and inference.
B
Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.
C
Create a component in the Vertex AI Pipelines directed acyclic graph (DAG) to calculate the required statistics, and pass the statistics on to subsequent components.
D
Create SQL queries to calculate and store the required statistics in separate BigQuery tables that are referenced in the CREATE MODEL statement.
No comments yet.