In your role as a Machine Learning Engineer at a retail company, you are responsible for a demand forecasting pipeline that preprocesses raw data using Dataflow before model training and prediction. The preprocessing includes Z-score normalization on data stored in BigQuery, which is then written back to BigQuery. With new training data being added weekly, you are tasked with enhancing the process to reduce computation time and minimize manual intervention. Considering the need for scalability, cost-efficiency, and compliance with data governance policies, which of the following solutions would BEST meet these requirements? Choose the two most effective options.

Real Exam

Utilize Google Kubernetes Engine for data normalization to leverage containerization and orchestration for scalability.

1.7%

Implement the normalization algorithm directly in SQL within BigQuery to simplify the process and reduce computational overhead.

44.8%

Leverage TensorFlow‘s Feature Column API with the normalizer_fn argument for normalization within the model training pipeline.

17.2%

Employ Apache Spark via the Dataproc connector for BigQuery to normalize the data, taking advantage of Spark's distributed computing capabilities.

1.7%

Combine the use of BigQuery SQL for initial data normalization and TensorFlow‘s Feature Column API for real-time normalization during model inference to optimize both preprocessing and inference phases.

34.5%

Google Professional Machine Learning Engineer

Get started today

Comments