
Answer-first summary for fast verification
Answer: Set up a Vertex AI Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.
Option B is optimal because Vertex AI Pipelines is a fully managed service that minimizes infrastructure management effort, which aligns with the primary requirement. It natively integrates with Dataproc through predefined components for PySpark workloads, eliminating the need for custom component development. This approach provides seamless orchestration while leveraging existing PySpark infrastructure. Option A requires writing custom components, increasing development effort. Option C uses Kubeflow Pipelines on GKE, which is not fully managed and requires infrastructure management. Option D uses Cloud Composer, which introduces orchestration overhead compared to the more integrated Vertex AI Pipelines solution.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are building an MLOps pipeline to retrain tree-based models in production. The pipeline must include components for data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark for data preprocessing, and you want to minimize infrastructure management. How should you architect this pipeline?
A
Set up a TensorFlow Extended (TFX) pipeline on Vertex AI Pipelines to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.
B
Set up a Vertex AI Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.
C
Set up Kubeflow Pipelines on Google Kubernetes Engine to orchestrate the MLOps pipeline. Write a custom component for the PySparkbased workloads on Dataproc.
D
Set up Cloud Composer to orchestrate the MLOps pipeline. Use Dataproc workflow templates for the PySpark-based workloads in Cloud Composer.
No comments yet.