
Answer-first summary for fast verification
Answer: 1. Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators. 2. Create a separate DAG for each table that needs to go through the pipeline. 3. Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG.
Option D is the correct answer. The transformations are in Dataproc and BigQuery, so operators for Cloud Storage are not needed, which eliminates options A and B. Since there is no fixed schedule for when the new data arrives, the DAG should be triggered based on the arrival of new files. Each table has different transformation jobs, indicating a need for a separate DAG for each table. Therefore, option D, which creates separate DAGs triggered by a Cloud Storage object, is the most suitable approach.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
To facilitate a series of sequential load and transformation jobs, consider the following scenario: Data files are incrementally added to a Cloud Storage bucket through an upstream process, but their arrival times are not predetermined. Upon the arrival of new data, a Dataproc job is triggered to execute initial transformations and then store the processed data in BigQuery. Following this, additional transformation jobs, which vary for each table, must be executed within BigQuery. These subsequent jobs can potentially take several hours to complete. Your task is to identify the most efficient and maintainable workflow that can manage the processing of hundreds of tables while consistently delivering the freshest data to your end users. What strategy should you implement?
A
B
C
D