
Answer-first summary for fast verification
Answer: Replace `spark.read` with `spark.readStream`
## Explanation To read from a Delta table as a streaming source in Databricks, you need to use `spark.readStream` instead of `spark.read`. The `spark.readStream` API is specifically designed for streaming data sources and provides the necessary functionality for incremental processing. ### Key Points: 1. **`spark.readStream` vs `spark.read`**: - `spark.read` is for batch processing (reading data once) - `spark.readStream` is for streaming processing (reading data incrementally) 2. **Other options analysis**: - **Option A**: Incorrect - There's no `predict` function in the code, and this is unrelated to streaming - **Option B**: Incorrect - `maxFilesPerTrigger` is an optional configuration for controlling micro-batch size, not required for basic streaming - **Option C**: Incorrect - You can stream from a Delta table using its name (`table("transactions")`) without needing the path - **Option D**: Incorrect - `format("delta")` is correct for Delta tables; there's no `format("stream")` 3. **Correct streaming code**: ```python transactions_df = (spark.readStream .schema(schema) .format("delta") .table("transactions") ) ``` This change enables the code to read from the Delta table as a streaming source, processing new data as it arrives.
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:
transactions_df = (spark.read
.schema(schema)
.format("delta")
.table("transactions")
)
transactions_df = (spark.read
.schema(schema)
.format("delta")
.table("transactions")
)
Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?
A
Replace predict with a stream-friendly prediction function
B
Replace schema(schema) with option ("maxFilesPerTrigger", 1)
C
Replace "transactions" with the path to the location of the Delta table
D
Replace format("delta") with format("stream")
E
Replace spark.read with spark.readStream