Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The code block used by the data engineer is below:

(spark.readStream
    .table("sales")
    .withColumn("avg_price", col("sales") / col("units"))
    .writeStream
    .option("checkpointLocation", checkpointPath)
    .outputMode("complete")
    .______
    .table("new_sales")
)

(spark.readStream
    .table("sales")
    .withColumn("avg_price", col("sales") / col("units"))
    .writeStream
    .option("checkpointLocation", checkpointPath)
    .outputMode("complete")
    .______
    .table("new_sales")
)

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

processingTime(1)

trigger(availableNow=True)

trigger(parallelBatch=True)

trigger(processingTime="once")

trigger(continuous="once")

Explanation:

Explanation

In Apache Spark Structured Streaming, the trigger() method controls when the streaming query executes micro-batches. The availableNow=True trigger option is specifically designed for processing all available data in multiple batches and then terminating the query.

Why Option B is correct:

trigger(availableNow=True) - This trigger processes all available data at the start of the query in one or more micro-batches, then stops the query. It's ideal for scenarios where you want to process all existing data without running a continuous streaming job.

Why other options are incorrect:

A. processingTime(1) - This would create a continuous streaming job that runs every 1 second, not just process all available data once.
C. trigger(parallelBatch=True) - This is not a valid trigger option in Spark Structured Streaming.
D. trigger(processingTime="once") - This is not a valid syntax; processingTime expects a time interval string like "1 second" or a number of seconds.
E. trigger(continuous="once") - This is not a valid trigger option; continuous mode requires a checkpoint interval.

Key Points:

availableNow=True is designed for batch-like processing of all available data
It processes data in multiple micro-batches to avoid overwhelming the system
The query automatically stops after processing all data
This is different from once=True (deprecated) which processed all data in a single batch
This trigger is particularly useful for incremental data processing scenarios where you want to process all new data since the last run

Powered ByGPT-5.2

Comments

Loading comments...