
Answer-first summary for fast verification
Answer: SELECT DISTINCT id, CAST(event_time AS TIMESTAMP) FROM dataset
Option C is the correct choice because it effectively uses the DISTINCT keyword to eliminate duplicate rows based on the 'id' column and correctly casts the 'event_time' to a timestamp format using the CAST function, ensuring data uniqueness and accurate time representation. Option A fails to cast the timestamp, Option B uses GROUP BY which is unnecessary for simply removing duplicates, and Option D incorrectly applies FROM_UNIXTIME, which is not universally applicable for all timestamp formats.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working on a project that requires processing a large dataset stored in Azure Databricks. The dataset contains a primary key 'id' and a timestamp column 'event_time'. Your task is to create a new table that ensures data uniqueness based on the 'id' column and converts the 'event_time' to a timestamp format for accurate time-based analysis. Considering the requirements for data uniqueness, correct timestamp conversion, and optimal performance, which of the following Spark SQL queries would you choose? (Choose one option)
A
SELECT DISTINCT * FROM dataset WHERE id IS NOT NULL
B
SELECT id, CAST(event_time AS TIMESTAMP) FROM dataset GROUP BY id
C
SELECT DISTINCT id, CAST(event_time AS TIMESTAMP) FROM dataset
D
SELECT id, FROM_UNIXTIME(event_time) AS event_time FROM dataset GROUP BY id
No comments yet.