
Explanation:
In Apache Spark Structured Streaming, the .load() method is used to create a streaming DataFrame from a data source. Here's why:
.load() - This method triggers the actual loading of data from the specified source (Kafka in this case) and returns a streaming DataFrame that can be used for further transformations and operations.
.print() - This is used for debugging purposes to print the schema of the DataFrame, not for loading data.
.return() - This is not a valid method in Spark DataFrame API for loading data sources.
.merge() - This is used for merging DataFrames in batch processing, not for loading streaming data sources.
The correct sequence for creating a streaming DataFrame from Kafka is:
spark.readStream - Create a streaming reader.format("kafka") - Specify the data source format.option() - Set various configuration options.load() - Load the data and return a streaming DataFrameThis streaming DataFrame can then be used with operations like .writeStream to output the processed data to various sinks.
Ultimate access to all questions.
Which of the following functions completes the following code snippet to return a Spark DataFrame in a structured streaming query?
spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "...")
.option("subscribe", "topic")
------
spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "...")
.option("subscribe", "topic")
------
A
.load()
B
.print()
C
.return()
D
.merge()
No comments yet.