Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


You are designing a data pipeline in Azure Databricks to incrementally process data from a bronze to a silver layer using Delta Lake. The pipeline must ensure data quality, handle deduplication, and be cost-effective. Which of the following approaches BEST meets these requirements? Choose one option.




Explanation:

Option D is the correct answer because it efficiently uses Delta Lake's READ and WRITE statements with append mode to incrementally process data, ensuring data quality and deduplication. This approach is also cost-effective as it minimizes the amount of data processed and stored. Options A, B, and C either do not fully address deduplication or are less efficient in terms of cost and performance.