
Answer-first summary for fast verification
Answer: The `.read` line should be replaced with `.readStream`.
## Explanation In Apache Spark Structured Streaming, to perform a streaming read (as opposed to a batch read), you need to use `.readStream` instead of `.read`. **Key Points:** - `.read` is used for batch processing - `.readStream` is used for streaming processing - The `.format("cloudFiles")` is correct for reading from cloud storage with Auto Loader - The `.option("cloudFiles.format", "json")` is correct for specifying JSON format - The `.schema(schema)` and `.load(dataSource)` are properly configured **Why other options are incorrect:** - **Option B**: There is no `.stream` method in Spark's DataFrameReader API - **Option C**: `.format("stream")` is not a valid format - `"cloudFiles"` is the correct format for Auto Loader - **Option D**: There is no `.stream` method that can be added after `spark` - **Option E**: Adding `.stream` after `.load(dataSource)` would be syntactically incorrect The corrected code should be: ```python (spark .readStream .schema(schema) .format("cloudFiles") .option("cloudFiles.format", "json") .load(dataSource) ) ```
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
Question 25
A data engineer has developed a code block to perform a streaming read on a data source. The code block is below:
(spark
.read
.schema(schema)
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(dataSource)
)
(spark
.read
.schema(schema)
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(dataSource)
)
The code block is returning an error.
Which of the following changes should be made to the code block to configure the block to successfully perform a streaming read?
A
The .read line should be replaced with .readStream.
B
A new .stream line should be added after the .read line.
C
The .format("cloudFiles") line should be replaced with .format("stream").
D
A new .stream line should be added after the spark line.
E
A new .stream line should be added after the .load(dataSource) line.