Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

transactions_df = (spark.read
    .schema(schema)
    .format("delta")
    .table("transactions"))

transactions_df = (spark.read
    .schema(schema)
    .format("delta")
    .table("transactions"))

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:01

Replace predict with a stream-friendly prediction function

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

Replace "transactions" with the path to the location of the Delta table

Replace format("delta") with format("stream")

Replace spark.read with spark.readStream

Explanation:

When working with streaming data in Databricks, you need to use spark.readStream instead of spark.read to create a streaming DataFrame. The key differences are:

spark.readStream creates a streaming DataFrame that continuously reads data as it arrives, while spark.read creates a static DataFrame that reads all data at once.
The original code uses batch processing (spark.read), which is suitable for one-time data ingestion but not for streaming sources.
The other options are incorrect:
- A: There's no predict function in the code, so this option doesn't apply.
- B: While maxFilesPerTrigger is a valid streaming option, replacing schema(schema) with it would remove the schema definition, which is often necessary.
- C: You can read from a table name directly in streaming mode; you don't need to use the path location.
- D: format("stream") is not a valid format; you should keep format("delta") for Delta tables.
After replacing spark.read with spark.readStream, the code would look like:

transactions_df = (spark.readStream
    .schema(schema)
    .format("delta")
    .table("transactions"))

transactions_df = (spark.readStream
    .schema(schema)
    .format("delta")
    .table("transactions"))

This creates a streaming DataFrame that can process data incrementally as it arrives in the Delta table.

Powered ByGPT-5.2

Comments

Loading comments...