
Answer-first summary for fast verification
Answer: Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows.
The correct answer is D. Creating a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows is the best approach. This is because streaming Dataflow jobs can handle large volumes of data in real time and can perform fixed-time interval aggregations using tumbling windows. Additionally, tumbling windows are ideal for disjoint time intervals, which is required in this scenario.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with designing a data pipeline responsible for publishing application events to a Google Cloud Pub/Sub topic. While the order of these messages is not a priority, it is crucial to aggregate the events across distinct hourly intervals before the data is eventually loaded into BigQuery for further analysis. Considering the necessity to handle potentially large volumes of events and ensure scalability, which technology should you employ for processing and loading the aggregated data into BigQuery?
A
Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.
B
Schedule a Cloud Function to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.
C
Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.
D
Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows.
No comments yet.