Ultimate access to all questions.
A junior data engineer wants to use Delta Lake's Change Data Feed feature to build a Type 1 table that captures all historical valid values for every row in a bronze table (created with delta.enableChangeDataFeed = true
). They intend to run the following code daily:
from pyspark.sql.functions import col
(spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", 0)
.table("bronze")
.filter(col("_change_type").isin(["update_postimage", "insert"]))
.write
.mode("append")
.table("bronze_history_type1")
)
What describes the outcome and behavior of executing this query repeatedly?