
Answer-first summary for fast verification
Answer: Joining on event time constraint: clickTime >= impressionTime AND clickTime <= impressionTime interval 1 hour
The query's performance issue stems from the lack of a time constraint in the join condition, leading to an unbounded state size. Option A correctly introduces a time constraint (clickTime >= impressionTime AND clickTime <= impressionTime + interval 1 hour) that aligns with the watermark strategy, effectively limiting the state size and enabling state cleanup. This is the most efficient solution among the provided options. Options B and D propose time constraints that are either illogical or remove necessary watermarks, which are crucial for state management in streaming applications. Option C suggests an exact time match, which is impractical for streaming data where exact matches are rare and would not significantly reduce the state size.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineer needs to correlate advertisement impressions with user clicks by joining two streaming DataFrames. The Impressions stream has a watermark set on "event_time" for 10 minutes. The current implementation is:
impressions \
.groupBy(
window("event_time", "5 minutes"),
"id") \
.count() \
.withWatermark("event_time", "2 hours") \
.join(clicks, expr("clickAdId = impressionAdId"), "inner")
impressions \
.groupBy(
window("event_time", "5 minutes"),
"id") \
.count() \
.withWatermark("event_time", "2 hours") \
.join(clicks, expr("clickAdId = impressionAdId"), "inner")
The query performance is degrading significantly. What solution would improve its performance?
A
Joining on event time constraint: clickTime >= impressionTime AND clickTime <= impressionTime interval 1 hour
B
Joining on event time constraint: clickTime + 3 hours < impressionTime - 2 hours
C
Joining on event time constraint: clickTime == impressionTime using a leftOuter join
D
Joining on event time constraint: clickTime >= impressionTime - interval 3 hours and removing watermarks