
Explanation:
Correct Answer: A — Use the Delta Lake MERGE INTO statement.
MERGE INTO is the recommended Delta Lake approach for incremental upserts from bronze to silver. It: Processes only new or changed records → cost-effective. Handles deduplication by matching on a unique key. Enforces schema constraints to maintain data quality. Unlike append mode (Option D), it prevents duplicates and avoids rewriting the entire dataset (like Option B or C).
MERGE INTO silver_table AS s
USING bronze_table_updates AS b
ON s.id = b.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
MERGE INTO silver_table AS s
USING bronze_table_updates AS b
ON s.id = b.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
Ultimate access to all questions.
No comments yet.
You are designing a data pipeline in Azure Databricks to incrementally process data from a bronze to a silver layer using Delta Lake. The pipeline must ensure data quality, handle deduplication, and be cost-effective. Which of the following approaches BEST meets these requirements? Choose one option.
A
Use the Delta Lake MERGE INTO statement to update the silver layer with new or changed records from the bronze layer, ensuring data quality by enforcing schema constraints.
B
Implement a custom deduplication logic using a combination of SELECT DISTINCT and OVERWRITE statements, which may increase processing time and costs.
C
Leverage the Delta Lake WRITE statement with the overwriteSchema option to ensure schema enforcement and prevent data quality issues, but this may not handle deduplication effectively.
D
Utilize the Delta Lake READ and WRITE statements with the append mode to incrementally process data while maintaining data quality and deduplication, optimizing for cost and performance.