
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: Replace `spark.read` with `spark.readStream`
**Explanation:** When working with streaming data in Databricks, you need to use `spark.readStream` instead of `spark.read` to create a streaming DataFrame. The key differences are: 1. **`spark.readStream`** creates a streaming DataFrame that continuously reads data as it arrives, while `spark.read` creates a static DataFrame that reads all data at once. 2. The original code uses batch processing (`spark.read`), which is suitable for one-time data ingestion but not for streaming sources. 3. The other options are incorrect: - **A**: There's no `predict` function in the code, so this option doesn't apply. - **B**: While `maxFilesPerTrigger` is a valid streaming option, replacing `schema(schema)` with it would remove the schema definition, which is often necessary. - **C**: You can read from a table name directly in streaming mode; you don't need to use the path location. - **D**: `format("stream")` is not a valid format; you should keep `format("delta")` for Delta tables. 4. After replacing `spark.read` with `spark.readStream`, the code would look like: ```python transactions_df = (spark.readStream .schema(schema) .format("delta") .table("transactions")) ``` This creates a streaming DataFrame that can process data incrementally as it arrives in the Delta table.
Author: Keng Suppaseth
A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:
transactions_df = (spark.read
.schema(schema)
.format("delta")
.table("transactions"))
transactions_df = (spark.read
.schema(schema)
.format("delta")
.table("transactions"))
Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?
A
Replace predict with a stream-friendly prediction function
B
Replace schema(schema) with option ("maxFilesPerTrigger", 1)
C
Replace "transactions" with the path to the location of the Delta table
D
Replace format("delta") with format("stream")
E
Replace spark.read with spark.readStream
No comments yet.