Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

transactions_df = (spark.read
    .schema(schema)
    .format("delta")
    .table("transactions")
)

transactions_df = (spark.read
    .schema(schema)
    .format("delta")
    .table("transactions")
)

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

Replace predict with a stream-friendly prediction function

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

Replace "transactions" with the path to the location of the Delta table

Replace format("delta") with format("stream")

Replace spark.read with spark.readStream

Explanation:

To read from a Delta table as a streaming source in Databricks, you need to use spark.readStream instead of spark.read. This is the fundamental difference between batch and streaming operations in Spark Structured Streaming.

Key points:

spark.read is used for batch processing - it reads the entire dataset at once.
spark.readStream is used for streaming processing - it reads data incrementally as it arrives.
The other options are incorrect:
- A: There's no predict function in the code, so this doesn't apply.
- B: While maxFilesPerTrigger can be used to control micro-batch size in streaming, it's not required to make the code work as a stream source.
- C: You can use the table name directly with streaming; you don't need to specify the path.
- D: The format should remain "delta" for Delta tables; there's no "stream" format.

Correct streaming code would look like:

transactions_df = (spark.readStream
    .schema(schema)
    .format("delta")
    .table("transactions")
)

transactions_df = (spark.readStream
    .schema(schema)
    .format("delta")
    .table("transactions")
)

This change enables the pipeline to process data incrementally as new data arrives in the Delta table, which is essential for streaming applications.

Powered ByGPT-5.2

Comments

Loading comments...