Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The code block used by the data engineer is below:

(spark.table("sales")
.withcolumn("avg_price", col("sales")/ col("units"))
.writestream
.option("checkpointLocation", checkpointpath)
.outputMode("complete")
_____
.table("new_sales"))

(spark.table("sales")
.withcolumn("avg_price", col("sales")/ col("units"))
.writestream
.option("checkpointLocation", checkpointpath)
.outputMode("complete")
_____
.table("new_sales"))

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:03

trigger("5 seconds")

trigger()

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

trigger(continuous="5 seconds")

Explanation:

Explanation

In Apache Spark Structured Streaming, the trigger() method is used to specify how often the streaming query should process data. There are three main trigger types:

ProcessingTime trigger: Processes data at fixed time intervals (micro-batches)
Once trigger: Processes all available data once and then stops
Continuous trigger: Processes data continuously (experimental feature)

For this question, the data engineer wants to execute a micro-batch every 5 seconds. The correct syntax for a processing time trigger is:

trigger(processingTime="5 seconds")

trigger(processingTime="5 seconds")

Let's analyze each option:

A. trigger("5 seconds"): This is incorrect because the trigger method requires specifying the trigger type explicitly.
B. trigger(): This would use the default trigger (which processes data as soon as possible), not every 5 seconds.
C. trigger(once="5 seconds"): This is incorrect syntax for the once trigger. The once trigger doesn't take a time parameter.
D. trigger(processingTime="5 seconds"): CORRECT - This specifies a processing time trigger that executes micro-batches every 5 seconds.
E. trigger(continuous="5 seconds"): This is incorrect syntax for continuous trigger. Continuous trigger uses trigger(continuous="1 second") format, but this would process data continuously, not in micro-batches every 5 seconds.

Key Points:

processingTime trigger is used for fixed-interval micro-batch processing
The time interval can be specified as a string like "5 seconds", "1 minute", etc.
This trigger ensures the query processes data in micro-batches at the specified interval, which is exactly what the data engineer needs.

Powered ByGPT-5.2

Comments

Loading comments...