Databricks Certified Data Engineer - Associate

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Question 25

A data engineer has developed a code block to perform a streaming read on a data source. The code block is below:

(spark
 .read
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

(spark
 .read
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

The code block is returning an error.

Which of the following changes should be made to the code block to configure the block to successfully perform a streaming read?

Real Exam

Community

LLeetQuiz

The .read line should be replaced with .readStream.

A new .stream line should be added after the .read line.

The .format("cloudFiles") line should be replaced with .format("stream").

A new .stream line should be added after the spark line.

A new .stream line should be added after the .load(dataSource) line.

Explanation:

Explanation

In Apache Spark Structured Streaming, to perform a streaming read (as opposed to a batch read), you need to use .readStream instead of .read.

Key Points:

.read is used for batch processing
.readStream is used for streaming processing
The .format("cloudFiles") is correct for reading from cloud storage with Auto Loader
The .option("cloudFiles.format", "json") is correct for specifying JSON format
The .schema(schema) and .load(dataSource) are properly configured

Why other options are incorrect:

Option B: There is no .stream method in Spark's DataFrameReader API
Option C: .format("stream") is not a valid format - "cloudFiles" is the correct format for Auto Loader
Option D: There is no .stream method that can be added after spark
Option E: Adding .stream after .load(dataSource) would be syntactically incorrect

The corrected code should be:

(spark
 .readStream
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

(spark
 .readStream
 .schema(schema)
 .format("cloudFiles")
 .option("cloudFiles.format", "json")
 .load(dataSource)
)

Powered ByGPT-5.2

Loading comments...