
Ultimate access to all questions.
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?
A
trigger("5 seconds")
B
trigger(continuous="5 seconds")
C
trigger(once="5 seconds")
D
trigger(processingTime="5 seconds")
Explanation:
The correct answer is D. trigger(processingTime="5 seconds").
Trigger Types in Structured Streaming:
Syntax Analysis:
trigger("5 seconds") - This is valid but ambiguous in some contexts, though it typically defaults to processingTimetrigger(continuous="5 seconds") - Incorrect syntax for continuous trigger (should be trigger(Trigger.Continuous("5 seconds")))trigger(once="5 seconds") - Incorrect syntax for once trigger (should be trigger(Trigger.Once()))trigger(processingTime="5 seconds") - Correct and explicit syntax for processing time triggerRequirements Match:
processingTime trigger behaviorIn Apache Spark Structured Streaming, triggers control when streaming queries execute micro-batches:
For the specific requirement of "every 5 seconds", trigger(processingTime="5 seconds") is the most explicit and correct choice. While trigger("5 seconds") might work in some Spark versions, the explicit processingTime parameter is clearer and less ambiguous.
In newer Spark versions, you could also use:
trigger(Trigger.ProcessingTime("5 seconds"))trigger(processingTime='5 seconds') (in Python)But among the given options, trigger(processingTime="5 seconds") is the most precise and correct answer.