
Answer-first summary for fast verification
Answer: Enable CDF on the Delta Lake table and modify the Spark job to filter and process only the changed data rows.
Enabling CDF on a Delta Lake table involves setting a configuration flag. The Spark job needs to be adjusted to read from the CDF-enabled table and process only the changed rows, which involves filtering logic to identify and handle inserts, updates, and deletes separately. This approach ensures that the job efficiently processes only the necessary data, leveraging the capabilities of CDF.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Consider a scenario where you have a Delta Lake table that previously processed incremental feeds from Structured Streaming. Your task is to enable Change Data Feed (CDF) on this table and redesign the data processing steps to handle Change Data Capture (CDC) output. Describe in detail how you would modify the existing Spark job to leverage CDF for processing CDC data, including any necessary code changes and the rationale behind these changes.
A
Add a simple configuration to enable CDF and adjust the read stream to use CDC data.
B
Rewrite the entire Spark job to use a different data source that natively supports CDC.
C
Enable CDF on the Delta Lake table and modify the Spark job to filter and process only the changed data rows.
D
No changes are needed; the existing job can process CDC data without modifications.
No comments yet.