
Answer-first summary for fast verification
Answer: trigger(once=True)
## Explanation In Apache Spark Structured Streaming, when you want to execute a streaming query that processes all available data in a **single micro-batch** and then stops, you should use the `trigger(once=True)` option. ### Why Option A is correct: - `trigger(once=True)` is specifically designed for this use case - It processes all available data in one batch and then terminates the streaming query - This is commonly used for batch-like processing using the streaming framework ### Why other options are incorrect: - **B. trigger(continuous="once")**: This syntax is incorrect. Continuous processing mode doesn't use "once" as a parameter - **C. processingTime("once")**: This is invalid syntax. `processingTime` expects a time interval string like "1 second" or "5 minutes" - **D. trigger(processingTime="once")**: Invalid syntax - `processingTime` trigger expects a time interval, not the string "once" - **E. processingTime(1)**: This would trigger the query every 1 millisecond, creating continuous micro-batches, not a single batch ### Key Points: - `trigger(once=True)` is the proper way to execute a one-time batch processing using Structured Streaming - This is useful for scenarios where you want to process all available data at once and then stop - The query will process all data that has accumulated since the last trigger and then terminate
Author: LeetQuiz .
Ultimate access to all questions.
Question 26 A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
If the data engineer only wants the query to execute a single micro-batch to process all of the available data, which of the following lines of code should the data engineer use to fill in the blank?
A
trigger(once=True)
B
trigger(continuous="once")
C
processingTime("once")
D
trigger(processingTime="once")
E
processingTime(1)