
Explanation:
The best code snippet for the data scientist to update specific rows in the Delta table 'customer_data' based on a condition is B. Here's why:
MERGE statement, specifically designed for Delta tables. It allows specifying a condition to identify the rows to update, avoiding unnecessary modifications. Delta optimizes writes based on changes, minimizing write operations and improving performance.write.format approach, suitable for creating or appending data but not ideal for targeted updates. It can be less efficient and doesn't offer the granularity of MERGE.Therefore, option B provides the most appropriate and efficient way to achieve the desired outcome using MERGE with Delta tables.
Ultimate access to all questions.
A data scientist is working on a Databricks notebook and needs to execute a Spark SQL query on a Delta table named 'customer_data.' The goal is to update specific rows based on a condition. Which code snippet should they use?
A
customer_data.update('column1 = 'new_value'', 'condition')
B
spark.sql('MERGE INTO customer_data USING updates_df ON condition WHEN MATCHED THEN UPDATE SET column1 = 'new_value'')
C
spark.sql('UPDATE customer_data SET column1 = 'new_value' WHERE condition')
D
customer_data.write.format('delta').mode('update').option('set', 'column1 = 'new_value'').save()
No comments yet.