
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
A junior data engineer is developing a streaming data pipeline to perform grouped aggregations on DataFrame df
. The pipeline must compute the average humidity and average temperature for each device in non-overlapping five-minute intervals, with events recorded every minute per device.
Streaming DataFrame df
has the schema:
"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"
The following code block contains syntax errors and typos. Correct them and fill in the missing logic to achieve the desired aggregation:
df.withWatermark("event_time", "10 minutes")
.groupBy(
"device_id",
window("event_time", "5 minutes")
)
.agg(
avg("temp").alias("avg_temp"),
avg("humidity").alias("avg_humidity")
)
.writeStream
.format("delta")
.saveAsTable("sensor_avg")
Choose the correct option to complete the missing logic in the code block.
A junior data engineer is developing a streaming data pipeline to perform grouped aggregations on DataFrame df
. The pipeline must compute the average humidity and average temperature for each device in non-overlapping five-minute intervals, with events recorded every minute per device.
Streaming DataFrame df
has the schema:
"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"
The following code block contains syntax errors and typos. Correct them and fill in the missing logic to achieve the desired aggregation:
df.withWatermark("event_time", "10 minutes")
.groupBy(
"device_id",
window("event_time", "5 minutes")
)
.agg(
avg("temp").alias("avg_temp"),
avg("humidity").alias("avg_humidity")
)
.writeStream
.format("delta")
.saveAsTable("sensor_avg")
Choose the correct option to complete the missing logic in the code block.
Exam-Like