
Ultimate access to all questions.
You are tasked with developing a Kubeflow pipeline on Google Kubernetes Engine (GKE) for a machine learning project. The pipeline's initial step requires querying a large dataset stored in BigQuery, with the results to be used as input for subsequent steps. The project has tight deadlines, and you need to ensure the solution is cost-effective, scalable, and minimizes manual intervention. Given these constraints, what is the most efficient method to integrate BigQuery query execution into your Kubeflow pipeline? Choose the best option.
A
Manually execute the query via the BigQuery console, save the results into a new table, and configure your pipeline to read from this table. This approach requires manual steps each time the pipeline runs.
B
Write a custom Python script that uses the BigQuery API to execute the query and save the results to a temporary storage. Then, modify your pipeline to include this script as its first step, adding complexity to pipeline maintenance.
C
Design a custom component using Kubeflow Pipelines DSL that directly interacts with the BigQuery client library for Python. This method offers flexibility but requires significant development effort and expertise.
D
Utilize the Kubeflow Pipelines ComponentStore to find and integrate a pre-built BigQuery Query Component from GitHub. This component is designed to execute queries against BigQuery and can be easily added to your pipeline with minimal setup.
E
Combine options B and C by developing a custom component that not only executes the BigQuery query but also processes the data before passing it to the next pipeline step. This approach offers customization at the cost of increased development time.