
Answer-first summary for fast verification
Answer: spark.sql('MERGE INTO customer_data USING updates_df ON condition WHEN MATCHED THEN UPDATE SET column1 = 'new_value'')
The best code snippet for the data scientist to update specific rows in the Delta table 'customer_data' based on a condition is **B**. Here's why: - **Option A**: This syntax is for DataFrames, not Delta tables. While it allows updating rows based on a condition, it's not specifically designed for Delta and might not leverage its optimization capabilities. - **Option B**: This uses the `MERGE` statement, specifically designed for Delta tables. It allows specifying a condition to identify the rows to update, avoiding unnecessary modifications. Delta optimizes writes based on changes, minimizing write operations and improving performance. - **Option C**: This is a standard SQL UPDATE statement, but it's not recommended for Delta tables. Updating the entire table can be inefficient and lead to data inconsistencies. - **Option D**: This uses the `write.format` approach, suitable for creating or appending data but not ideal for targeted updates. It can be less efficient and doesn't offer the granularity of `MERGE`. Therefore, **option B** provides the most appropriate and efficient way to achieve the desired outcome using `MERGE` with Delta tables.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data scientist is working on a Databricks notebook and needs to execute a Spark SQL query on a Delta table named 'customer_data.' The goal is to update specific rows based on a condition. Which code snippet should they use?
A
customer_data.update('column1 = 'new_value'', 'condition')
B
spark.sql('MERGE INTO customer_data USING updates_df ON condition WHEN MATCHED THEN UPDATE SET column1 = 'new_value'')
C
spark.sql('UPDATE customer_data SET column1 = 'new_value' WHERE condition')
D
customer_data.write.format('delta').mode('update').option('set', 'column1 = 'new_value'').save()
No comments yet.