
Question 26 A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
(spark.table("sales")
.withColumn("avg_price", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.______
.table("new_sales"))
If the data engineer only wants the query to execute a single micro-batch to process all of the available data, which of the following lines of code should the data engineer use to fill in the blank?
Explanation:
In Apache Spark Structured Streaming, when you want to execute a streaming query that processes all available data in a single micro-batch and then stops, you should use the trigger(once=True) option.
trigger(once=True) is specifically designed for this use caseprocessingTime expects a time interval string like "1 second" or "5 minutes"processingTime trigger expects a time interval, not the string "once"trigger(once=True) is the proper way to execute a one-time batch processing using Structured StreamingUltimate access to all questions.