
Answer-first summary for fast verification
Answer: `trigger(processingTime="5 minutes")`
The correct answer is `trigger(processingTime="5 minutes")`. This is because the `trigger` method specifies when the system should process the next set of data in a streaming write. The `processingTime` parameter is used to define the interval at which micro-batches are processed. In this case, setting it to "5 minutes" ensures the query is triggered every 5 minutes. Other options like `trigger(once=True)` or `trigger(availableNow=True)` do not meet the requirement of periodic execution. Note that `Trigger.AvailableNow` is a newer trigger type available in specific versions of Databricks Runtime (DBR).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineer has set up a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The code block used is as follows:
spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
._________
.table("new_sales")
spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
._________
.table("new_sales")
If the goal is for the query to execute in micro-batches and be triggered every 5 minutes, which of the following lines of code should fill in the blank?
A
trigger(processingTime=5 minutes)
B
trigger(once=True)
C
trigger(processingTime="5 minutes")
D
trigger(availableNow=True)
E
processingTime(5)
No comments yet.