Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

You are working as a Data Engineer for a retail company that uses Azure Databricks to process and analyze large volumes of sales data stored in Spark tables. The company has recently updated its product catalog, and you are tasked with updating multiple records in a Spark table to reflect these changes. The updates must be performed in a cost-effective manner, ensuring minimal impact on performance and maintaining data integrity. Considering the constraints of cost, performance, and data integrity, which of the following strategies would be the BEST approach to update the records? Choose one option.

Simulated

Use the 'update' operation in Spark SQL to update the records directly, as it is the simplest method and requires minimal code.

24.0%

Use the 'join' operation to merge the new data with the existing data and overwrite the table, which may improve performance for large datasets but requires careful handling of data types and null values.

Comments

Loading comments...