Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has developed a code block to perform a streaming read on a data source. The code block is below:

(spark
 .read
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

(spark
 .read
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

The code block is returning an error.

Which of the following changes should be made to the code block to configure the block to successfully perform a streaming read?

Exam-Like

Community

KKeng

Last updated: January 13, 2026 at 09:02

The .read line should be replaced with .readStream.

A new .stream line should be added after the .read line.

The .format("cloudFiles") line should be replaced with .format("stream").

A new .stream line should be added after the spark line.

A new .stream line should be added after the .load(dataSource) line.

Explanation:

Explanation

In Apache Spark Structured Streaming, to perform a streaming read from a data source, you must use .readStream instead of .read. The .read method is used for batch processing, while .readStream is specifically designed for streaming operations.

Key Points:

.readStream vs .read:
- .read(): Creates a DataFrameReader for batch processing
- .readStream(): Creates a DataFrameReader for streaming processing

Correct Code Structure:

(spark
 .readStream  # Changed from .read
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

(spark
 .readStream  # Changed from .read
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

Why Other Options Are Incorrect:
- Option B: There is no .stream method in Spark's DataFrameReader API
- Option C: Changing .format("cloudFiles") to .format("stream") would break the format specification
- Option D: Adding .stream after spark is not valid Spark syntax
- Option E: Adding .stream after .load(dataSource) is not valid Spark syntax
CloudFiles Format: The .format("cloudFiles") is correct for reading files from cloud storage with Auto Loader, which supports both batch and streaming reads. When combined with .readStream, it enables incremental file processing.

This change is essential because Spark Structured Streaming requires explicit declaration of streaming operations through the .readStream method to properly handle incremental data processing, state management, and trigger configurations.

Powered ByGPT-5.2

Comments

Loading comments...