Ultimate access to all questions.
A Delta Lake table named customer_churn_params
with Change Data Feed (CDF) enabled is used for churn prediction in a Lakehouse environment. This table contains customer data aggregated from multiple upstream sources. Currently, the data engineering team refreshes this table nightly by fully overwriting it with the latest valid values from upstream sources.
The machine learning team's churn prediction model is stable in production and only needs to process records that have changed within the last 24 hours.
What approach would most efficiently identify these recently changed records?