
Ultimate access to all questions.
You are a Machine Learning Engineer at a rapidly growing e-commerce company. Your team has developed a Keras-based proof-of-concept model for product recommendation, currently in a Jupyter notebook. The notebook includes steps for data validation, model analysis, and training. With the anticipation of increasing training data volume, your manager has tasked you with productionizing this model. The solution must automate weekly retraining, scale efficiently with growing data, and minimize costs by leveraging Google Cloud's managed services. Additionally, the solution should ensure reproducibility and ease of maintenance. Which of the following approaches best meets these requirements? (Choose two options if E is available)
A
Convert the Jupyter notebook steps into Python scripts, encapsulate each in an Apache Airflow BashOperator, and run the resulting DAG in Cloud Composer. This approach leverages Airflow's scheduling capabilities but may require manual scaling and additional cost management.
B
Transform the Jupyter notebook steps into an Apache Spark job and schedule its execution on ephemeral Dataproc clusters using Cloud Scheduler. This approach is scalable but may introduce complexity in managing Spark jobs and cluster lifecycle.
C
Migrate the Jupyter notebook to a Notebooks instance on the largest N2 machine type and use Cloud Scheduler to schedule the notebook steps. This approach is straightforward but lacks scalability and cost-effectiveness for growing data volumes.
D
Develop the code as a TensorFlow Extended (TFX) pipeline, orchestrated with Vertex AI Pipelines. Utilize standard TFX components for data validation and model analysis, and Vertex AI Pipelines for model retraining. This approach provides a scalable, cost-effective, and managed solution for production ML workflows.
E
Implement the solution using both Cloud Composer for orchestration and Vertex AI Pipelines for model training and analysis. This hybrid approach leverages the strengths of both services but may increase complexity and cost.