Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The code block used by the data engineer is below:

(spark.table("sales")
  .withColumn("avg_price", col("sales") / col("units"))
  .writeStream
  .option("checkpointLocation", checkpointPath)
  .outputMode("complete")
  .table("new_sales"))

(spark.table("sales")
  .withColumn("avg_price", col("sales") / col("units"))
  .writeStream
  .option("checkpointLocation", checkpointPath)
  .outputMode("complete")
  .table("new_sales"))

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:03

trigger("5 seconds")

trigger()

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

trigger(continuous="5 seconds")

Explanation:

Explanation

In Apache Spark Structured Streaming, the trigger() method is used to specify how often the streaming query should process data. There are several trigger types:

ProcessingTime trigger: Processes data at fixed intervals (micro-batch processing)
Once trigger: Processes all available data once and then stops
Continuous trigger: Processes data continuously (experimental)

For processing data every 5 seconds using micro-batch processing, the correct syntax is:

.trigger(processingTime="5 seconds")

.trigger(processingTime="5 seconds")

Why other options are incorrect:

A. trigger("5 seconds"): This syntax is not valid in Spark Structured Streaming. The trigger method requires specifying the trigger type.
B. trigger(): Without parameters, this would use the default trigger which processes data as soon as possible, not at fixed intervals.
C. trigger(once="5 seconds"): This is incorrect syntax. The once trigger doesn't take a time parameter - it processes all data once and stops.
E. trigger(continuous="5 seconds"): Continuous trigger is for continuous processing mode, not micro-batch processing. Also, continuous trigger doesn't use time intervals in this way.

Correct usage in the code:

(spark.table("sales")
  .withColumn("avg_price", col("sales") / col("units"))
  .writeStream
  .option("checkpointLocation", checkpointPath)
  .trigger(processingTime="5 seconds")
  .outputMode("complete")
  .table("new_sales"))

(spark.table("sales")
  .withColumn("avg_price", col("sales") / col("units"))
  .writeStream
  .option("checkpointLocation", checkpointPath)
  .trigger(processingTime="5 seconds")
  .outputMode("complete")
  .table("new_sales"))

This configuration will execute a micro-batch every 5 seconds, processing any new data that has arrived since the last micro-batch.

Powered ByGPT-5.2

Comments

Loading comments...