
Ultimate access to all questions.
You are collaborating with a data engineering team that has established a pipeline to cleanse and store datasets in a Cloud Storage bucket. Your team has developed an ML model that requires automatic updates whenever new data becomes available, as part of a CI/CD workflow. The workflow must efficiently trigger a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE) without unnecessary resource consumption or delays. Considering cost-effectiveness, scalability, and real-time processing, which of the following solutions is the BEST to architect this workflow? (Choose two options if E is available)
A
Develop a lightweight Python client using App Engine to continuously monitor Cloud Storage for new files and initiate the training job upon detection, ensuring minimal latency.
B
Configure Cloud Scheduler to periodically check the Cloud Storage bucket for new files. If no new files are found since the last check, the job is terminated to save resources.
C
Implement a Cloud Storage trigger to send a notification to a Pub/Sub topic when a new file is added. Use a Pub/Sub-triggered Cloud Function to start the training job on GKE, leveraging event-driven architecture for efficiency.
D
Utilize Dataflow to process and store files in Cloud Storage, then automatically trigger the training job on GKE once the file is saved, ensuring data is processed in a streamlined manner.
E
Combine both Cloud Scheduler for periodic checks and Cloud Storage triggers for immediate notifications, creating a hybrid approach that ensures no new data is missed while optimizing resource usage.