
Answer-first summary for fast verification
Answer: 1. Develop an Apache Airflow DAG in Cloud Composer using Dataproc and BigQuery operators for sequential tasks. 2. Create a unique DAG for each table in the pipeline. 3. Utilize a Cloud Storage object trigger to initiate a Cloud Function that starts the DAG.
The correct approach involves creating a separate DAG for each table to manage different transformation jobs efficiently and using a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG, ensuring processing begins as soon as new data arrives. This method is both efficient and reliable for processing numerous tables and delivering up-to-date data. Other options either lack efficiency by not accounting for the unpredictable arrival of new data or are not maintainable due to the complexity of managing different transformation jobs in a single DAG.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with scheduling a series of sequential load and transformation jobs where data files are unpredictably uploaded to a Cloud Storage bucket by an upstream process. A Dataproc job then processes these files, storing the results in BigQuery, followed by various transformation jobs in BigQuery with differing durations. Your objective is to design a workflow that efficiently processes hundreds of tables, ensuring end users always have access to the most current data. How would you achieve this?
A
B
C
D