
Answer-first summary for fast verification
Answer: `trigger(once=True)`
## Explanation In Apache Spark Structured Streaming, when you want to execute a streaming query that processes all available data only once (a single micro-batch), you should use the `trigger(once=True)` option. ### Key Points: 1. **`trigger(once=True)`** - This is the correct syntax for Scala/Java API to trigger a single micro-batch that processes all available data and then stops the query. 2. **Other options explained:** - **Option B (`trigger(continuous="once")`)**: Incorrect syntax. Continuous processing mode doesn't use "once" parameter. - **Option C (`processingTime("once")`)**: Incorrect. `processingTime` expects a time interval string like "1 second", "5 minutes", etc., not "once". - **Option D (`trigger(processingTime="once")`)**: Incorrect syntax. `processingTime` should be a method, not a parameter. - **Option E (`processingTime(1)`)**: This would trigger micro-batches every 1 second continuously, not just once. 3. **How it works:** - When `trigger(once=True)` is used, the streaming query: - Processes all available data in the source - Executes exactly one micro-batch - Then automatically stops the query - This is useful for batch-like processing using the streaming API 4. **Note on Scala syntax:** In the provided Scala code, the correct syntax would be `.trigger(Trigger.Once())` or `.trigger(once=True)` depending on the exact Spark version and API usage. The option `A` with `trigger(once=True)` is the conceptually correct answer among the given choices. This trigger mode is particularly useful for: - Processing all available data at once - Running periodic batch jobs using streaming infrastructure - Testing and development scenarios
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
If the data engineer only wants the query to execute a single micro-batch to process all of the available data, which of the following lines of code should the data engineer use to fill in the blank?
A
trigger(once=True)
B
trigger(continuous="once")
C
processingTime("once")
D
trigger(processingTime="once")
E
processingTime(1)