
Explanation:
Explanation:
To read from a Delta table as a streaming source in Databricks, you need to use spark.readStream instead of spark.read. This is the fundamental difference between batch and streaming operations in Spark Structured Streaming.
Key points:
spark.read is used for batch processing - it reads the entire dataset at once.spark.readStream is used for streaming processing - it reads data incrementally as it arrives.predict function in the code, so this doesn't apply.maxFilesPerTrigger can be used to control micro-batch size in streaming, it's not required to make the code work as a stream source.Correct streaming code would look like:
transactions_df = (spark.readStream
.schema(schema)
.format("delta")
.table("transactions")
)
transactions_df = (spark.readStream
.schema(schema)
.format("delta")
.table("transactions")
)
This change enables the pipeline to process data incrementally as new data arrives in the Delta table, which is essential for streaming applications.
Ultimate access to all questions.
No comments yet.
A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:
transactions_df = (spark.read
.schema(schema)
.format("delta")
.table("transactions")
)
transactions_df = (spark.read
.schema(schema)
.format("delta")
.table("transactions")
)
Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?
A
Replace predict with a stream-friendly prediction function
B
Replace schema(schema) with option ("maxFilesPerTrigger", 1)
C
Replace "transactions" with the path to the location of the Delta table
D
Replace format("delta") with format("stream")
E
Replace spark.read with spark.readStream