Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below.

spark.readStream \
    .table("sales") \
    .withColumn("avg_price", col("sales") / col("units")) \
    .writeStream \
    .option("checkpointLocation", checkpointPath) \
    .outputMode("complete") \
    ._______ \
    .table("new_sales")

spark.readStream \
    .table("sales") \
    .withColumn("avg_price", col("sales") / col("units")) \
    .writeStream \
    .option("checkpointLocation", checkpointPath) \
    .outputMode("complete") \
    ._______ \
    .table("new_sales")

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Real Exam

Community

KKeng

Last updated: February 18, 2026 at 12:26

processingTime(1)

trigger(availableNow=True)

trigger(parallelBatch=True)

Explanation:

Explanation

In Apache Spark Structured Streaming, the trigger() method controls how often the streaming query processes data. The availableNow=True trigger is specifically designed for processing all available data in multiple batches.

Key Points:

trigger(availableNow=True): This trigger processes all currently available data in the source, but does so in multiple micro-batches rather than a single batch. This is ideal for scenarios where you want to process all data but maintain the streaming semantics and avoid overwhelming the system with a single large batch.
Why not the other options:
- processingTime(1): This would trigger the query every 1 second, which would continuously process data as it arrives, not just the currently available data.
- trigger(parallelBatch=True): This is not a valid trigger option in Structured Streaming.
Use Case: The availableNow trigger is particularly useful for:
- Processing backlogged data
- One-time data migration tasks
- Batch-like processing while maintaining streaming semantics
- Processing all available data without running continuously

This trigger ensures that all currently available data is processed efficiently while maintaining the benefits of the streaming execution model.

Powered ByGPT-5.2

Comments

Loading comments...