
Ultimate access to all questions.
You are tasked with designing a data pipeline that processes raw data from a bronze table to a silver table. The raw data contains duplicates and requires incremental processing. Describe the steps you would take to ensure data quality and deduplication. Include how you would enforce data quality at each stage and the specific Delta Lake features you would use.
A
Use Spark SQL to filter duplicates and apply incremental processing without any specific Delta Lake features.
B
Implement a Delta Lake table with auto-compaction and use MERGE INTO for deduplication and incremental processing.
C
Manually check for duplicates and apply incremental processing using a custom Python script.
D
Use a combination of Spark and Delta Lake to filter duplicates, apply incremental processing, and enforce data quality using CHECK constraints.