
Answer-first summary for fast verification
Answer: It treats a live data stream as an unbounded table where new data is continuously appended as additional rows.
### Explanation The core abstraction of **Spark Structured Streaming** is that a streaming DataFrame represents an **unbounded table**. * **Unbounded Table Model:** New data arriving in the stream is treated as new rows being appended to an input table. Developers can write queries against this table as if it were a static table, and Spark handles the incremental execution as data arrives. * **Processing vs. Messaging:** Structured Streaming is a processing engine, not a messaging system like Apache Kafka. While it integrates deeply with Kafka, its role is to transform and analyze data, not to act as the storage/message broker. * **CPU-based Execution:** Structured Streaming is built on the Spark SQL engine, which uses CPU-based DataFrame transformations rather than a GPU-based model. * **Stateful vs. Stateless:** While Structured Streaming can maintain state (e.g., for windowed aggregations), the general programming model is defined by the unbounded table concept rather than a requirement to maintain state for all cached stages. **References:** - Apache Spark Structured Streaming Programming Guide: 'This DataFrame represents an unbounded table containing the streaming text data.' - Databricks Documentation: 'In Structured Streaming, a data stream is treated as a table that is being continuously appended.'
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Which statement best describes the fundamental programming model used by Spark Structured Streaming?
A
It functions as a distributed messaging system designed to replace brokers like Apache Kafka.
B
It relies on a GPU-accelerated parallel processing framework to achieve high data throughput.
C
It treats a live data stream as an unbounded table where new data is continuously appended as additional rows.
D
It depends on a distributed network of nodes to maintain incremental state values for all cached query stages by default.