
Ultimate access to all questions.
You are working on a project that requires processing a large dataset stored in Azure Databricks. The dataset contains a primary key 'id' and a timestamp column 'event_time'. Your task is to create a new table that ensures data uniqueness based on the 'id' column and converts the 'event_time' to a timestamp format for accurate time-based analysis. Considering the requirements for data uniqueness, correct timestamp conversion, and optimal performance, which of the following Spark SQL queries would you choose? (Choose one option)
A
SELECT DISTINCT * FROM dataset WHERE id IS NOT NULL*
B
SELECT id, CAST(event_time AS TIMESTAMP) FROM dataset GROUP BY id_
C
SELECT DISTINCT id, CAST(event_time AS TIMESTAMP) FROM dataset_
D
SELECT id, FROM_UNIXTIME(event_time) AS event_time FROM dataset GROUP BY id_