
Ultimate access to all questions.
You are designing a data pipeline in Azure Databricks to incrementally process data from a bronze to a silver layer using Delta Lake. The pipeline must ensure data quality, handle deduplication, and be cost-effective. Which of the following approaches BEST meets these requirements? Choose one option.
A
Use the Delta Lake MERGE INTO statement to update the silver layer with new or changed records from the bronze layer, ensuring data quality by enforcing schema constraints.
B
Implement a custom deduplication logic using a combination of SELECT DISTINCT and OVERWRITE statements, which may increase processing time and costs.
C
Leverage the Delta Lake WRITE statement with the overwriteSchema option to ensure schema enforcement and prevent data quality issues, but this may not handle deduplication effectively.
D
Utilize the Delta Lake READ and WRITE statements with the append mode to incrementally process data while maintaining data quality and deduplication, optimizing for cost and performance.