You are collaborating with a data engineering team that has established a pipeline to cleanse and store datasets in a Cloud Storage bucket. Your team has developed an ML model that requires automatic updates whenever new data becomes available, as part of a CI/CD workflow. The workflow must efficiently trigger a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE) without unnecessary resource consumption or delays. Considering cost-effectiveness, scalability, and real-time processing, which of the following solutions is the BEST to architect this workflow? (Choose two options if E is available)

Real Exam

Develop a lightweight Python client using App Engine to continuously monitor Cloud Storage for new files and initiate the training job upon detection, ensuring minimal latency.

1.6%

Configure Cloud Scheduler to periodically check the Cloud Storage bucket for new files. If no new files are found since the last check, the job is terminated to save resources.

6.3%

Implement a Cloud Storage trigger to send a notification to a Pub/Sub topic when a new file is added. Use a Pub/Sub-triggered Cloud Function to start the training job on GKE, leveraging event-driven architecture for efficiency.

50.0%

Utilize Dataflow to process and store files in Cloud Storage, then automatically trigger the training job on GKE once the file is saved, ensuring data is processed in a streamlined manner.

12.5%

Combine both Cloud Scheduler for periodic checks and Cloud Storage triggers for immediate notifications, creating a hybrid approach that ensures no new data is missed while optimizing resource usage.

29.7%

Google Professional Machine Learning Engineer

Get started today

Comments