LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


You are designing a pipeline to send application events to a Pub/Sub topic. While event ordering is not critical, you need to aggregate events across separate hourly intervals before transferring the data to BigQuery for analysis. Which technology should you use to efficiently handle and transfer this data to BigQuery, ensuring it can scale with high volumes of events?

Real Exam



Explanation:

The correct answer is D because a streaming Dataflow job is the most suitable technology for processing and loading data from a Pub/Sub topic to BigQuery while ensuring scalability with large volumes of events. Using Dataflow allows for continuous processing of incoming data in real-time, which is essential for aggregating events across disjoint hourly intervals as mentioned in the question. Tumbling windows in Dataflow can be used to aggregate data within specific time intervals, such as hourly intervals in this case.

  • Option A: While using Dataflow is a good choice, scheduling a batch job to run hourly is not optimal for processing real-time events. It may not meet the requirement of aggregating events across disjoint hourly intervals in a continuous manner.
  • Option B: This option would not be ideal for aggregating events across hourly intervals as it triggers on each message published, which may not align with the hourly aggregation requirement. Additionally, scaling may become an issue with large volumes of events.
  • Option C: This option involves batch processing and is not suitable for real-time processing of events as required. It may not be able to handle large volumes of events efficiently.
Powered ByGPT-5