Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

(spark.readStream
  .table("sales")
  .withColumn("avg_price", col("sales") / col("units"))
  .writeStream
  .option("checkpointLocation", checkpointPath)
  .outputMode("complete")
  .______
  .table("new_sales")
)

(spark.readStream
  .table("sales")
  .withColumn("avg_price", col("sales") / col("units"))
  .writeStream
  .option("checkpointLocation", checkpointPath)
  .outputMode("complete")
  .______
  .table("new_sales")
)

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Real Exam

Community

KKeng

Last updated: February 12, 2026 at 14:03

processingTime(1)

trigger(availableNow=True)

trigger(parallelBatch=True)

trigger(processsingTime="once")

trigger(continuous="once")

Explanation:

Explanation

The correct answer is B. trigger(availableNow=True).

Why this is correct:

availableNow=True is a trigger option introduced in Spark 3.3.0 that processes all available data in the source and then stops the streaming query.
It processes data in multiple micro-batches as needed to handle all available data, which matches the requirement "in as many batches as required."
This trigger is specifically designed for processing all currently available data and then stopping, unlike continuous triggers that run indefinitely.

Why other options are incorrect:

A. processingTime(1): This would trigger the query every 1 second indefinitely, not just process all available data and stop.
C. trigger(parallelBatch=True): This is not a valid trigger option in Spark Structured Streaming.
D. trigger(processsingTime="once"): While "once" is a valid trigger, the syntax is incorrect. The correct syntax would be .trigger(once=True) or .trigger(processingTime='0 seconds').
E. trigger(continuous="once"): This is not a valid trigger option. Continuous triggers require a checkpoint interval, not a "once" parameter.

Key Concept:

The availableNow=True trigger is ideal for scenarios where you want to process all currently available data from a source (like processing a backlog) in a streaming fashion but don't want the query to run indefinitely. It's particularly useful for:

Processing historical data
Catching up on backlogged data
One-time data migration tasks
Batch-like processing with streaming semantics

Powered ByGPT-5.2

Comments

Loading comments...