
Answer-first summary for fast verification
Answer: Both A and B
When a record is updated in a table with Change Data Feed enabled, it generates two records: `update_preimage` (the value before the update) and `update_postimage` (the value after the update). Therefore, either of these two values can be used to count the number of updates made in the source table. This makes both options A and B correct, but not C, as 'update' alone does not specify which image to count. Hence, the correct answer is 'Both A and B'.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
How can a data engineer count all records updated in the source table using Delta Lake's Change Data Feed?
A
spark.read.format("delta") \ .option("readChangeFeed", "true") \ .option("startingVersion", 0) \ .table("source") \ .where(col('_change_type') == 'update_postimage') \ .count()
B
spark.read.format("delta") \ .option("readChangeFeed", "true") \ .option("startingVersion", 0) \ .table("source") \ .where(col('_change_type') == 'update_preimage') \ .count()
C
spark.read.format("delta") \ .option("readChangeFeed", "true") \ .option("startingVersion", 0) \ .table("source") \ .where(col('_change_type') == 'update') \ .count()
D
All of the above
E
Both A and B