
Answer-first summary for fast verification
Answer: `trigger(once=True)`
To understand the correct answer, let's explore Trigger Intervals in Structured Streaming. The `trigger` method specifies when the system should process the next set of data. Triggers control the frequency of micro-batches. By default, Spark processes all new data since the last trigger automatically. For executing a single micro-batch to process all available data at once, the correct syntax is `trigger(once=True)`. This ensures the query runs exactly one micro-batch for all current data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineer has set up a Structured Streaming job to read from a table, aggregate the data, and then perform a streaming write into a new table. The code block used is as follows:
spark.table("sales")
.groupBy("store")
.agg(sum("sales").alias("sum_sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("aggregatedSales")
spark.table("sales")
.groupBy("store")
.agg(sum("sales").alias("sum_sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("aggregatedSales")
If the goal is to execute only a single micro-batch to process all available data, which line of code should fill in the blank?
A
trigger(continuous="once")
B
processingTime("once")
C
trigger(processingTime="once")
D
trigger(once=True)
E
processingTime(1)