Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

transactions_df = (spark.read
    .schema(schema)
    .format("delta")
    .table("transactions")
)

transactions_df = (spark.read
    .schema(schema)
    .format("delta")
    .table("transactions")
)

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:03

Replace predict with a stream-friendly prediction function

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

Replace "transactions" with the path to the location of the Delta table

Replace format("delta") with format("stream")

Replace spark.read with spark.readStream

Explanation:

Explanation

To read from a Delta table as a streaming source in Databricks, you need to use spark.readStream instead of spark.read. The spark.readStream API is specifically designed for streaming data sources and provides the necessary functionality for incremental processing.

Key Points:

spark.readStream vs spark.read:
- spark.read is for batch processing (reading data once)
- spark.readStream is for streaming processing (reading data incrementally)
Other options analysis:
- Option A: Incorrect - There's no predict function in the code, and this is unrelated to streaming
- Option B: Incorrect - maxFilesPerTrigger is an optional configuration for controlling micro-batch size, not required for basic streaming
- Option C: Incorrect - You can stream from a Delta table using its name (table("transactions")) without needing the path
- Option D: Incorrect - format("delta") is correct for Delta tables; there's no format("stream")

Correct streaming code:

transactions_df = (spark.readStream
    .schema(schema)
    .format("delta")
    .table("transactions")
)

transactions_df = (spark.readStream
    .schema(schema)
    .format("delta")
    .table("transactions")
)

This change enables the code to read from the Delta table as a streaming source, processing new data as it arrives.

Powered ByGPT-5.2

Comments

Loading comments...