
Answer-first summary for fast verification
Answer: window("event_time", "5 minutes").alias("time")
The correct answer is B because the task requires aggregating data into non-overlapping (tumbling) five-minute intervals. In Spark Structured Streaming, the `window` function is used for time-based groupings. Specifying `window("event_time", "5 minutes")` creates tumbling windows of 5 minutes, which aligns with the requirement. Other options are incorrect: `to_interval` (A) is not a valid function, using `event_time` directly (C) groups per minute, and `lag` (D) is for row-wise offsets, not windowing.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A junior data engineer is developing a streaming pipeline to calculate average humidity and temperature per device in 5-minute non-overlapping windows. Given a streaming DataFrame df with schema "device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT", which line of code correctly completes this aggregation?
df.withWatermark("event_time", "10 minutes")
.groupBy(
window("event_time", "5 minutes"),
"device_id"
)
.agg(
avg("temp").alias("avg_temp"),
avg("humidity").alias("avg_humidity")
)
.writeStream
.format("delta")
.saveAsTable("sensor_avg")
df.withWatermark("event_time", "10 minutes")
.groupBy(
window("event_time", "5 minutes"),
"device_id"
)
.agg(
avg("temp").alias("avg_temp"),
avg("humidity").alias("avg_humidity")
)
.writeStream
.format("delta")
.saveAsTable("sensor_avg")
A
to_interval("event_time", "5 minutes").alias("time")
B
window("event_time", "5 minutes").alias("time")
C
"event_time"
D
lag("event_time", "10 minutes").alias("time")