
Answer-first summary for fast verification
Answer: `.load()`
## Explanation In Apache Spark Structured Streaming, the `.load()` method is used to create a streaming DataFrame from a data source. Here's why: - **`.load()`** - This method triggers the actual loading of data from the specified source (Kafka in this case) and returns a streaming DataFrame that can be used for further transformations and operations. - **`.print()`** - This is used for debugging purposes to print the schema of the DataFrame, not for loading data. - **`.return()`** - This is not a valid method in Spark DataFrame API for loading data sources. - **`.merge()`** - This is used for merging DataFrames in batch processing, not for loading streaming data sources. The correct sequence for creating a streaming DataFrame from Kafka is: 1. `spark.readStream` - Create a streaming reader 2. `.format("kafka")` - Specify the data source format 3. `.option()` - Set various configuration options 4. **`.load()`** - Load the data and return a streaming DataFrame This streaming DataFrame can then be used with operations like `.writeStream` to output the processed data to various sinks.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
Which of the following functions completes the following code snippet to return a Spark DataFrame in a structured streaming query?
spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "...")
.option("subscribe", "topic")
------
spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "...")
.option("subscribe", "topic")
------
A
.load()
B
.print()
C
.return()
D
.merge()