Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

A table named `customer_churn_params` in the Lakehouse is utilized for churn prediction by the machine learning team. This table contains customer information aggregated from multiple upstream sources. Currently, the data engineering team refreshes this table nightly by completely overwriting it with the latest valid values from upstream sources.

The ML team's churn prediction model is stable in production, and they only need to generate predictions for records that have been modified within the last 24 hours.

What approach would streamline the process of identifying these recently changed records?

Exam-Like

Apply the churn model to all rows in the customer_churn_params table, but implement logic to perform an upsert into the predictions table that ignores rows where predictions have not changed.

10.6%

Convert the batch job to a Structured Streaming job using the complete output mode; configure a Structured Streaming job to read from the customer_churn_params table and incrementally predict against the churn model.

Comments

Loading comments...