Your team is working on a machine learning project where data is continuously cleaned and stored in a Cloud Storage bucket. To ensure your ML model stays up-to-date with the latest data, you plan to automate the retraining process using Kubeflow Pipelines on Google Kubernetes Engine (GKE) as part of a CI/CD workflow. The solution must be cost-effective, scalable, and responsive to new data arrivals without unnecessary delays or overhead. Considering these requirements, what is the best way to architect this workflow? Choose the two most appropriate options.

Real Exam

Develop a lightweight application using App Engine that continuously monitors the Cloud Storage bucket for new files and initiates the training job upon detection, ensuring immediate response to new data.

3.7%

Configure Cloud Scheduler to periodically check the Cloud Storage bucket for new files. If new files are detected since the last check, it triggers the training job on GKE. This approach ensures regular checks but may introduce delays.

1.9%

Set up a Cloud Storage trigger to publish a notification to a Pub/Sub topic whenever a new file is added. Use a Pub/Sub-triggered Cloud Function to launch the training job on GKE, providing an event-driven and scalable solution.

50.0%

Utilize Dataflow to process and save files in Cloud Storage, with a subsequent step to automatically start the training job on GKE. This method integrates data processing and model training but may add complexity.

14.8%

Combine both the event-driven approach using Pub/Sub and Cloud Functions for immediate response and the scheduled checks with Cloud Scheduler as a fallback mechanism to ensure no data is missed.

29.6%

Google Professional Machine Learning Engineer

Get started today

Comments