
Ultimate access to all questions.
In a data engineering project, you are working with a Delta Lake table named 'employee_data' that contains columns 'employee_id', 'first_name', 'last_name', and 'salary'. Due to a data ingestion error, there are duplicate entries based on the 'employee_id' column. Your task is to deduplicate the data efficiently while ensuring the solution is scalable and maintains data integrity. Considering the need for a solution that is both performant and easy to maintain, which of the following Spark SQL queries would you use to deduplicate the 'employee_data' table based on the 'employee_id' column? Choose the best option from the following:_
A
ALTER TABLE employee_data DROP DUPLICATE KEY (employee_id)
B
DELETE FROM employee_data WHERE ROWID() NOT IN (SELECT MIN(ROWID()) FROM employee_data GROUP BY employee_id)_
C
ALTER TABLE employee_data DROP PARTITIONING_
D
SELECT employee_id, first_name, last_name, salary FROM ( SELECT , ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY employee_id) as rn FROM employee_data ) WHERE rn = 1_