
Answer-first summary for fast verification
Answer: `trigger(processingTime="5 seconds")`
## Explanation In Apache Spark Structured Streaming, the `trigger()` method controls how often the streaming query executes micro-batches. The correct syntax for processing data at fixed time intervals is: ```scala .trigger(processingTime="5 seconds") ``` Let's analyze each option: **A. `trigger("5 seconds")`** - This is incorrect because the `trigger()` method doesn't accept a simple string parameter. It requires either `Trigger.ProcessingTime()` or `Trigger.Continuous()` syntax. **B. `trigger(continuous="5 seconds")`** - This is incorrect for two reasons: 1. The correct parameter name for continuous processing is `Trigger.Continuous()` not `continuous=` 2. Continuous processing mode is different from micro-batch processing and has different characteristics **C. `trigger(once="5 seconds")`** - This is incorrect because `Trigger.Once()` processes all available data once and then stops, not at regular intervals. **D. `trigger(processingTime="5 seconds")`** - **CORRECT**. This uses the proper syntax for micro-batch processing with a fixed interval of 5 seconds. This will execute a micro-batch every 5 seconds, processing any new data that has arrived since the last micro-batch. ### Key Points: - `processingTime="5 seconds"` specifies the interval at which micro-batches are executed - This is the standard way to control the frequency of micro-batch processing in Structured Streaming - The checkpoint location ensures fault tolerance by saving progress information - Output mode "complete" means the entire result table is written to the sink each time
Author: Keng Suppaseth
Ultimate access to all questions.
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?
A
trigger("5 seconds")
B
trigger(continuous="5 seconds")
C
trigger(once="5 seconds")
D
trigger(processingTime="5 seconds")
No comments yet.