Google Professional Data Engineer

Get started today

Ultimate access to all questions.

Explanation:

The question asks how to aggregate data from an unbounded source every hour based on the time it entered the pipeline in Cloud Dataflow/Beam.There are two main concepts in Apache Beam/Dataflow windowing and triggering:Event Time:This is the time when the event actually occurred, as recorded in the data itself (e.g., a log timestamp).Event time triggers and watermarks are used when you want to aggregate or window data based on when the event happened, regardless of when it was processed.Processing Time:This is the time when the data is processed by the pipeline (i.e., when it enters the system).Processing time triggers are used when you want to aggregate or window data based on when it arrives or is processed, not when it was originally generated.Why D is Correct:The question specifically says "based on the time it entered the pipeline," which refers to processing time.Processing time triggers allow you to emit windowed results based on the system clock as data flows through the pipeline.This is different from event time triggers, which would use the timestamp embedded in the data (which could be delayed or out of order).Other Options:A: An hourly watermark – Watermarks are used to track event time progress, not processing time.B: An event time trigger – This would aggregate based on the event’s timestamp, not when it entered the pipeline.C: The withAllowedLateness method – This controls how late data is handled for event time windows, not when to trigger results.

Explanation:

Comments (0)

No comments yet.

To aggregate data from an unbounded source every hour based on the time it entered the pipeline in Cloud Dataflow/Beam, which feature should you utilize?

Real Exam

An hourly watermark

18.9%

An event time trigger

30.2%

The with Allowed Lateness method

1.9%

A processing time trigger

49.1%