
Explanation:
The optimal solution involves using a Cloud Storage trigger to send notifications to a Pub/Sub topic upon the arrival of new files, coupled with a Pub/Sub-triggered Cloud Function to initiate the training job on GKE. This approach is both efficient and scalable, aligning with MLOps best practices. Option E introduces a hybrid model that combines the immediacy of event-driven triggers with the thoroughness of periodic checks, offering a robust solution that minimizes the risk of missing new data while conserving resources. For detailed insights, refer to Google Cloud's architecture for MLOps.
Ultimate access to all questions.
You are collaborating with a data engineering team that has established a pipeline to cleanse and store datasets in a Cloud Storage bucket. Your team has developed an ML model that requires automatic updates whenever new data becomes available, as part of a CI/CD workflow. The workflow must efficiently trigger a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE) without unnecessary resource consumption or delays. Considering cost-effectiveness, scalability, and real-time processing, which of the following solutions is the BEST to architect this workflow? (Choose two options if E is available)
A
Develop a lightweight Python client using App Engine to continuously monitor Cloud Storage for new files and initiate the training job upon detection, ensuring minimal latency.
B
Configure Cloud Scheduler to periodically check the Cloud Storage bucket for new files. If no new files are found since the last check, the job is terminated to save resources.
C
Implement a Cloud Storage trigger to send a notification to a Pub/Sub topic when a new file is added. Use a Pub/Sub-triggered Cloud Function to start the training job on GKE, leveraging event-driven architecture for efficiency.
D
Utilize Dataflow to process and store files in Cloud Storage, then automatically trigger the training job on GKE once the file is saved, ensuring data is processed in a streamlined manner.
E
Combine both Cloud Scheduler for periodic checks and Cloud Storage triggers for immediate notifications, creating a hybrid approach that ensures no new data is missed while optimizing resource usage.
No comments yet.