
Answer-first summary for fast verification
Answer: Implement an orchestration framework such as Kubeflow Pipelines or Vertex AI Pipelines.
The question requires building an ML pipeline that handles both streaming and batch data, performs data validation, preprocessing, model training, and deployment in a consistent, automated, and reproducible way with component reusability and metadata tracking. Option D (Kubeflow Pipelines or Vertex AI Pipelines) is the correct choice because these are purpose-built ML orchestration frameworks that specifically support metadata tracking, component reusability, end-to-end automation, reproducibility, and integration with Vertex AI services. Option A (Cloud Composer) is primarily for workflow orchestration using Apache Airflow but is not optimized for ML-specific needs like metadata tracking. Option B (Dataflow) is for data processing (batch/streaming) but does not provide ML pipeline orchestration capabilities. Option C (Cloud Build) is for CI/CD and building Docker images, which is a supporting task but not the main orchestration solution for the ML pipeline.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are building an ML pipeline to process and analyze both streaming and batch datasets. The pipeline must handle data validation, preprocessing, model training, and model deployment in a consistent and automated way. You need to design an efficient and scalable solution that captures model training metadata and is easily reproducible. You also want to be able to reuse custom components for different parts of your pipeline. What should you do?
A
Use Cloud Composer for distributed processing of batch and streaming data in the pipeline.
B
Use Dataflow for distributed processing of batch and streaming data in the pipeline.
C
Use Cloud Build to build and push Docker images for each pipeline component.
D
Implement an orchestration framework such as Kubeflow Pipelines or Vertex AI Pipelines.
No comments yet.