
Ultimate access to all questions.
You are tasked with developing a Kubeflow pipeline on Google Kubernetes Engine (GKE) for a machine learning project. The initial step involves querying a large dataset stored in BigQuery, with the results serving as input for subsequent pipeline steps. The project has strict requirements for efficiency, scalability, and minimal manual intervention. Considering these constraints, which approach should you adopt to integrate BigQuery query execution into your Kubeflow pipeline? Choose the best option from the following:
A
Develop a custom Python script utilizing the BigQuery API to execute the query. This script would need to be containerized and deployed as the first step in your Kubeflow pipeline, requiring additional setup for authentication and data handling.
B
Manually execute the query via the BigQuery console, then export the results to a new BigQuery table. This table would then be referenced in the subsequent steps of your pipeline, introducing manual steps and potential delays.
C
Create a custom component using the Kubeflow Pipelines domain-specific language (DSL) that leverages the Python BigQuery client library. This approach offers flexibility but requires significant development effort to handle query execution and result processing.
D
Utilize the pre-built BigQuery Query Component available in the Kubeflow Pipelines GitHub repository. By referencing this component's URL in your pipeline, you can seamlessly execute queries against BigQuery without the need for custom development.
E
Combine the use of the pre-built BigQuery Query Component for query execution with a custom component for data transformation, ensuring optimal performance and flexibility. This hybrid approach leverages existing solutions while addressing specific project needs.