
Ultimate access to all questions.
You are tasked with optimizing a demand forecasting pipeline that currently uses Dataflow for preprocessing raw data, which includes applying Z-score normalization to data stored in BigQuery. The pipeline writes the processed data back to BigQuery, with new training data added weekly. Your goal is to enhance the pipeline's efficiency by reducing computation time and minimizing manual intervention. Considering the need for scalability, cost-effectiveness, and compliance with data processing standards, which of the following approaches would you recommend? (Choose two options if option E is available)
A
Utilize Google Kubernetes Engine (GKE) to deploy a custom normalization service, allowing for flexible scaling and management of normalization tasks.
B
Implement the normalization algorithm directly in SQL within BigQuery, leveraging its powerful query engine for efficient data processing.
C
Use TensorFlow’s Feature Column normalizer_fn within your model to handle normalization during the training phase, avoiding preprocessing steps._
D
Deploy Apache Spark on Dataproc with the BigQuery connector to perform normalization, taking advantage of Spark's distributed computing capabilities.
E
Combine the use of BigQuery's SQL for initial data filtering and aggregation with TensorFlow’s normalizer_fn for feature-specific normalization within the model._