
Ultimate access to all questions.
The machine learning team needs to optimize the workflow for identifying changed records in the customer_churn_params table to trigger updates for their churn prediction model. Which method would most effectively streamline the identification of these records for incremental processing?
A
Calculate the difference between the previous model predictions and the current customer_churn_params using a unique customer key before making new predictions, only processing customers not found in the previous set.
B
Modify the overwrite logic to include a field populated by spark.sql.functions.current_timestamp() during the write process, then use this field to filter for records written on a specific date._
C
Replace the current overwrite logic with a MERGE statement and enable the Delta Lake Change Data Feed (CDF) to identify and process only those records that have been inserted or updated.
D
Convert the batch job to a Structured Streaming job using complete output mode to read from the customer_churn_params table and incrementally predict against the churn model.
E
Apply the churn model to all rows in the customer_churn_params table, but implement logic to perform an upsert into the predictions table that ignores rows where predictions have not changed.