
Ultimate access to all questions.
In the context of incremental data processing within Azure Databricks, consider a scenario where a data engineer is tasked with updating a customer database. The database must be updated with new customer information daily, without altering the existing table schema. Additionally, the solution must ensure data integrity and minimize operational costs. Which of the following commands should the data engineer use to achieve these requirements, and why? Choose the best option from the following:
A
Use 'CREATE OR REPLACE TABLE' to create a new table with the updated customer information, as it allows for schema changes and ensures the latest data is always available.
B
Use 'INSERT OVERWRITE' to overwrite the existing customer data with the new information, as it preserves the table schema and efficiently updates the data without additional storage costs.
C
Use a combination of 'CREATE OR REPLACE TABLE' and 'INSERT OVERWRITE' to first create a backup of the existing table and then update it with new data, ensuring data safety and integrity.
D
Neither 'CREATE OR REPLACE TABLE' nor 'INSERT OVERWRITE' is suitable for this scenario; instead, use 'MERGE INTO' to update the customer database, as it provides more control over the data update process.