
Answer-first summary for fast verification
Answer: Write SQL queries to transform the data in-place in BigQuery.
The correct option is B. Performing the transformations in-place in BigQuery using SQL queries is optimal because it eliminates the need for additional data movement, thereby reducing preprocessing time. Moreover, leveraging BigQuery's serverless architecture can be more cost-effective as it scales processing power based on workload, avoiding the need to manage separate processing clusters. Also, since you are already familiar with SQL, the development effort required is minimized compared to learning and implementing new frameworks like Spark or Dataflow.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working as a data scientist for a food product company. The company has a significant amount of historical sales data stored in BigQuery, which you need to analyze. Your goal is to use Vertex AI's custom training service to train multiple TensorFlow models. These models will read data from BigQuery and predict future sales. Before you start experimenting with the models, you plan to implement a data preprocessing algorithm that includes min-max scaling and bucketing on a large number of features. Given constraints on minimizing preprocessing time, cost, and development effort, how should you configure this workflow?
A
Write the transformations into Spark that uses the spark-bigquery-connector, and use Dataproc to preprocess the data.
B
Write SQL queries to transform the data in-place in BigQuery.
C
Add the transformations as a preprocessing layer in the TensorFlow models.
D
Create a Dataflow pipeline that uses the BigQueryIO connector to ingest the data, process it, and write it back to BigQuery.
No comments yet.