
Ultimate access to all questions.
Which of the following functions completes the following code snippet to return a Spark DataFrame in a structured streaming query?
spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "...")
.option("subscribe", "topic")
------
spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "...")
.option("subscribe", "topic")
------
Explanation:
In Apache Spark Structured Streaming, the .load() method is used to create a streaming DataFrame from a data source. Here's why:
.load() - This method triggers the actual loading of data from the specified source (Kafka in this case) and returns a streaming DataFrame that can be used for further transformations and operations.
.print() - This is used for debugging purposes to print the schema of the DataFrame, not for loading data.
.return() - This is not a valid method in Spark DataFrame API for loading data sources.
.merge() - This is used for merging DataFrames in batch processing, not for loading streaming data sources.
The correct sequence for creating a streaming DataFrame from Kafka is:
spark.readStream - Create a streaming reader.format("kafka") - Specify the data source format.option() - Set various configuration options.load() - Load the data and return a streaming DataFrameThis streaming DataFrame can then be used with operations like .writeStream to output the processed data to various sinks.