LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


Comments

Loading comments...

A junior data engineer is developing a streaming data pipeline to perform grouped aggregations on DataFrame df. The pipeline must compute the average humidity and average temperature for each device in non-overlapping five-minute intervals, with events recorded every minute per device.

Streaming DataFrame df has the schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

The following code block contains syntax errors and typos. Correct them and fill in the missing logic to achieve the desired aggregation:

df.withWatermark("event_time", "10 minutes")  
   .groupBy(  
       "device_id",  
       window("event_time", "5 minutes")  
   )  
   .agg(  
       avg("temp").alias("avg_temp"),  
       avg("humidity").alias("avg_humidity")  
   )  
   .writeStream  
   .format("delta")  
   .saveAsTable("sensor_avg")  

Choose the correct option to complete the missing logic in the code block.

Exam-Like




Powered ByGPT-5