Ultimate access to all questions.
Your web server dispatches click events to a Pub/Sub topic as individual messages. Each message includes an attribute called eventTimestamp, indicating when the click event occurred. A Dataflow streaming job is set up to read from this Pub/Sub topic using a subscription, perform certain transformations, and subsequently write the results to another Pub/Sub topic intended for the advertising department. The advertising department requires that each message is delivered within 30 seconds of the corresponding click event, but they are currently experiencing delays in message receipt. On inspecting your Dataflow job, you observe a system lag of approximately 5 seconds and data freshness of around 40 seconds. Further inspection of a few messages reveals a lag of no more than 1 second between their eventTimestamp and publishTime. What is causing the delay and what actions should you take to resolve it?
Explanation:
The issue lies with the Dataflow job's inability to keep up with the rate of incoming messages, causing a backlog. This is indicated by the low system lag of 5 seconds (suggesting that individual messages are processed quickly) but high data freshness of 40 seconds (indicating a significant delay overall). Therefore, the correct action is to optimize your Dataflow job or increase the number of workers to address the backlog. Hence, the correct answer is C.