Ultimate access to all questions.
You are designing a pipeline to send application events to a Pub/Sub topic. While event ordering is not critical, you need to aggregate events across separate hourly intervals before transferring the data to BigQuery for analysis. Which technology should you use to efficiently handle and transfer this data to BigQuery, ensuring it can scale with high volumes of events?
Explanation:
The correct answer is D because a streaming Dataflow job is the most suitable technology for processing and loading data from a Pub/Sub topic to BigQuery while ensuring scalability with large volumes of events. Using Dataflow allows for continuous processing of incoming data in real-time, which is essential for aggregating events across disjoint hourly intervals as mentioned in the question. Tumbling windows in Dataflow can be used to aggregate data within specific time intervals, such as hourly intervals in this case.