
Ultimate access to all questions.
A data engineer is attempting to construct a Type 1 historical table by capturing all changes from a bronze Delta table where delta.enableChangeDataFeed is set to true. They implement the following PySpark code as a daily scheduled task:
from pyspark.sql.functions import col
(spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", 0)
.table("bronze")
.filter(col("_change_type").isin(("update_postimage", "insert")))
.write.mode("append")
.table("bronze_history")
)
from pyspark.sql.functions import col
(spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", 0)
.table("bronze")
.filter(col("_change_type").isin(("update_postimage", "insert")))
.write.mode("append")
.table("bronze_history")
)
How will repeatedly running this query impact the target table (bronze_history) over time?_
A
Only records inserted or updated since the last execution will be appended, successfully achieving the intended incremental result.
B
The target table will be entirely overwritten with the full history on each run, resulting in a clean but non-cumulative state.
C
Each execution will merge updates into the target table, overwriting prior values with matching keys to maintain the Type 1 structure.
D
Every run will append the entire history of inserts/updates from version 0, leading to massive data duplication as the same records are added repeatedly.
E
Each execution will calculate differences between the current version and the previous version, creating a delta-based historical log.