
Ultimate access to all questions.
A Delta Lake table named customer_churn_params in the Lakehouse is utilized for churn prediction by the machine learning team. This table contains customer data aggregated from multiple upstream sources. The data engineering team currently refreshes this table nightly by completely overwriting it with the latest valid values from upstream sources.
After each successful update, the team needs to identify the differences between the newly written version and the previous version of the table.
Given this implementation, which approach can be used to accomplish this?
A
Execute a query to calculate the difference between the new version and the previous version using Delta Lake’s built-in versioning and lime travel functionality.
B
Parse the Delta Lake transaction log to identify all newly written data files.
C
Parse the Spark event logs to identify those rows that were updated, inserted, or deleted.
D
Execute DESCRIBE HISTORY customer_churn_params to obtain the full operation metrics for the update, including a log of all records that have been added or modified.
E
Use Delta Lake’s change data feed to identify those records that have been updated, inserted, or deleted.