
Answer-first summary for fast verification
Answer: Implement the normalization algorithm directly in SQL within BigQuery to simplify the process and reduce computational overhead., Combine the use of BigQuery SQL for initial data normalization and TensorFlow‘s Feature Column API for real-time normalization during model inference to optimize both preprocessing and inference phases.
Implementing the normalization algorithm directly in SQL within BigQuery (Option B) significantly reduces computational time and manual intervention by eliminating the need for external processing tools. Combining this with TensorFlow‘s Feature Column API for real-time normalization during model inference (Option E) further optimizes the pipeline by streamlining both preprocessing and inference phases, ensuring scalability and cost-efficiency while adhering to data governance policies.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In your role as a Machine Learning Engineer at a retail company, you are responsible for a demand forecasting pipeline that preprocesses raw data using Dataflow before model training and prediction. The preprocessing includes Z-score normalization on data stored in BigQuery, which is then written back to BigQuery. With new training data being added weekly, you are tasked with enhancing the process to reduce computation time and minimize manual intervention. Considering the need for scalability, cost-efficiency, and compliance with data governance policies, which of the following solutions would BEST meet these requirements? Choose the two most effective options.
A
Utilize Google Kubernetes Engine for data normalization to leverage containerization and orchestration for scalability.
B
Implement the normalization algorithm directly in SQL within BigQuery to simplify the process and reduce computational overhead.
C
Leverage TensorFlow‘s Feature Column API with the normalizer_fn argument for normalization within the model training pipeline.
D
Employ Apache Spark via the Dataproc connector for BigQuery to normalize the data, taking advantage of Spark's distributed computing capabilities.
E
Combine the use of BigQuery SQL for initial data normalization and TensorFlow‘s Feature Column API for real-time normalization during model inference to optimize both preprocessing and inference phases.