
Explanation:
When you read a Delta table using the spark.table() function, it is treated as a static source. This means every time the query is executed, all records in the current version of the 'stream_sink' table are read, filtered, and deduplicated. The query then writes the data in 'overwrite' mode to the 'stream_data_stage' table, completely replacing the table's contents with each execution. It's important to note that spark.table() and spark.read.table() are functionally the same, as the latter internally calls the former. For more details, refer to the Spark documentation.
Ultimate access to all questions.
Given the following query:
.filter(“recent = true“)
.dropDuplicates([“item_id“, “item_timestamp“])
.write
.mode (“overwrite“)
.table(“stream_data_stage“)
.filter(“recent = true“)
.dropDuplicates([“item_id“, “item_timestamp“])
.write
.mode (“overwrite“)
.table(“stream_data_stage“)
Which statement accurately describes the outcome of executing this query?
A
An incremental job will overwrite the stream_sink table by those deduplicated records from stream_data_stage that have been added since the last time the job was run.
B
A batch job will overwrite the stream_data_stage table by deduplicated records calculated from all 'recent' items in the stream_sink table.
C
An incremental job will overwrite the stream_data_stage table by those deduplicated records from stream_sink that have been added since the last time the job was run.
D
A batch job will overwrite the stream_sink table by deduplicated records calculated from all 'recent' items in the stream_data_stage table.
E
A batch job will overwrite the stream_data_stage table by those deduplicated records from stream_sink that have been added since the last time the job was run.
No comments yet.