
Answer-first summary for fast verification
Answer: Enable CDC on the Delta Lake table and use the `readStream` method with the `continuous` trigger to read the CDC output.
The `continuous` trigger is the best choice for processing CDC output in real-time as it ensures all changes are captured and processed as they occur, meeting the project's strict latency requirements. It is also scalable and cost-effective as it dynamically adjusts to the load of data changes. Option A is incorrect because not specifying a trigger defaults to micro-batch processing, which may not meet the real-time requirement. Option C is incorrect because the `once` trigger is for one-time processing, unsuitable for continuous CDC. Option D is incorrect because the `every` trigger processes data at fixed intervals, potentially missing real-time changes and not meeting the latency requirement.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are a data engineer working on a project that requires real-time data processing from a Delta Lake table with Change Data Capture (CDC) enabled. The project has strict requirements for data latency and must ensure that all changes are captured and processed as they occur. Additionally, the solution must be cost-effective and scalable to handle varying loads of data changes. Given these constraints, which of the following approaches is the BEST to process CDC output from the Delta Lake table? Choose the correct option from the four provided.
A
Enable CDC on the Delta Lake table and use the readStream method without specifying a trigger to read the CDC output.
B
Enable CDC on the Delta Lake table and use the readStream method with the continuous trigger to read the CDC output.
C
Enable CDC on the Delta Lake table and use the readStream method with the once trigger to read the CDC output.
D
Enable CDC on the Delta Lake table and use the readStream method with the every trigger set to a fixed interval to read the CDC output.
No comments yet.