
Answer-first summary for fast verification
Answer: Each time the job is executed, the entire available history of inserted or updated records will be appended to the target table, resulting in many duplicate entries.
The code reads the bronze table's Change Data Feed starting from version 0 each time, capturing all historical inserts and updates (post-images). Since it appends the filtered changes to the target table without tracking the last processed version, every job execution appends the **entire history** of valid records again. This results in duplicate entries in the target table. Options B correctly identifies this behavior, while others incorrectly assume incremental processing, overwriting, or diff calculations.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A junior data engineer wants to use Delta Lake's Change Data Feed feature to build a Type 1 table that captures all historical valid values for every row in a bronze table (created with delta.enableChangeDataFeed = true). They intend to run the following code daily:
from pyspark.sql.functions import col
(spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", 0)
.table("bronze")
.filter(col("_change_type").isin(["update_postimage", "insert"]))
.write
.mode("append")
.table("bronze_history_type1")
)
from pyspark.sql.functions import col
(spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", 0)
.table("bronze")
.filter(col("_change_type").isin(["update_postimage", "insert"]))
.write
.mode("append")
.table("bronze_history_type1")
)
What describes the outcome and behavior of executing this query repeatedly?
A
Each time the job is executed, newly updated records will be merged into the target table, overwriting previous values with the same primary keys.
B
Each time the job is executed, the entire available history of inserted or updated records will be appended to the target table, resulting in many duplicate entries.
C
Each time the job is executed, the target table will be overwritten using the entire history of inserted or updated records, giving the desired result.
D
Each time the job is executed, the differences between the original and current versions are calculated; this may result in duplicate entries for some records.
E
Each time the job is executed, only those records that have been inserted or updated since the last execution will be appended to the target table, giving the desired result.