Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

A Delta Lake table named `customer_churn_params` with Change Data Feed (CDF) enabled is used for churn prediction in a Lakehouse environment. This table contains customer data aggregated from multiple upstream sources. Currently, the data engineering team refreshes this table nightly by fully overwriting it with the latest valid values from upstream sources.

The machine learning team's churn prediction model is stable in production and only needs to process records that have changed within the last 24 hours.

What approach would most efficiently identify these recently changed records?

Exam-Like

Apply the churn model to all rows in the customer_churn_params table, but implement logic to perform an upsert into the predictions table that ignores rows where predictions have not changed.

9.1%

Convert the batch job to a Structured Streaming job using the complete output mode; configure a Structured Streaming job to read from the customer_churn_params table and incrementally predict against the churn model.

Comments

Loading comments...