
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Question 25
A data engineer has developed a code block to perform a streaming read on a data source. The code block is below:
(spark
.read
.schema(schema)
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(dataSource)
)
(spark
.read
.schema(schema)
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(dataSource)
)
The code block is returning an error.
Which of the following changes should be made to the code block to configure the block to successfully perform a streaming read?
A
The .read line should be replaced with .readStream.
B
A new .stream line should be added after the .read line.
C
The .format("cloudFiles") line should be replaced with .format("stream").
D
A new .stream line should be added after the spark line.
E
A new .stream line should be added after the .load(dataSource) line.
Explanation:
In Apache Spark Structured Streaming, to perform a streaming read (as opposed to a batch read), you need to use .readStream instead of .read.
Key Points:
.read is used for batch processing.readStream is used for streaming processing.format("cloudFiles") is correct for reading from cloud storage with Auto Loader.option("cloudFiles.format", "json") is correct for specifying JSON format.schema(schema) and .load(dataSource) are properly configuredWhy other options are incorrect:
.stream method in Spark's DataFrameReader API.format("stream") is not a valid format - "cloudFiles" is the correct format for Auto Loader.stream method that can be added after spark.stream after .load(dataSource) would be syntactically incorrectThe corrected code should be:
(spark
.readStream
.schema(schema)
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(dataSource)
)
(spark
.readStream
.schema(schema)
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(dataSource)
)