Google Professional Data Engineer

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

NO.12 You are designing a pipeline that publishes application events to a Pub/Sub topic. You need to aggregate events across hourly intervals before loading the results to BigQuery for analysis. Your solution must be scalable so it can process and load large volumes of events to BigQuery. What should you do?

Real Exam

Community

LLeetQuiz

Create a streaming Dataflow job to continually read from the Pub/Sub topic and perform the necessary aggregations using tumbling windows

Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations

Schedule a Cloud Function to run hourly, pulling all alertable messages from the Pub/Sub topic and performing the necessary aggregations

Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.

Explanation:

Option A is the correct answer because:

Streaming Dataflow is designed for real-time data processing at scale
Tumbling windows provide exactly hourly aggregations with clear boundaries
Automatic scaling handles large volumes of events efficiently
Continuous processing ensures timely aggregation without delays
Native integration with Pub/Sub and BigQuery

Other options have limitations:

Option B: Batch processing introduces latency and may miss real-time events
Option C: Cloud Functions have execution time limits and are not designed for large-scale data processing
Option D: Cloud Functions triggered per message would be inefficient for aggregation and could hit rate limits

Powered ByGPT-5.2

Comments

Loading comments...