
Answer-first summary for fast verification
Answer: Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows.
The correct answer is **D** because a streaming Dataflow job is the most suitable technology for processing and loading data from a Pub/Sub topic to BigQuery while ensuring scalability with large volumes of events. Using Dataflow allows for continuous processing of incoming data in real-time, which is essential for aggregating events across disjoint hourly intervals as mentioned in the question. Tumbling windows in Dataflow can be used to aggregate data within specific time intervals, such as hourly intervals in this case. - **Option A**: While using Dataflow is a good choice, scheduling a batch job to run hourly is not optimal for processing real-time events. It may not meet the requirement of aggregating events across disjoint hourly intervals in a continuous manner. - **Option B**: This option would not be ideal for aggregating events across hourly intervals as it triggers on each message published, which may not align with the hourly aggregation requirement. Additionally, scaling may become an issue with large volumes of events. - **Option C**: This option involves batch processing and is not suitable for real-time processing of events as required. It may not be able to handle large volumes of events efficiently.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are designing a pipeline to send application events to a Pub/Sub topic. While event ordering is not critical, you need to aggregate events across separate hourly intervals before transferring the data to BigQuery for analysis. Which technology should you use to efficiently handle and transfer this data to BigQuery, ensuring it can scale with high volumes of events?
A
Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.
B
Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.
C
Schedule a Cloud Function to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.
D
Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows.
No comments yet.