
Ultimate access to all questions.
In a data engineering project, you are working with a large dataset stored in a Delta table on Databricks. The dataset includes a 'timestamp' column formatted as 'yyyy-MM-dd HH:mm:ss'. Your task is to analyze the data by extracting the year, month, and day from the 'timestamp' column to facilitate time-based analysis. Considering the need for efficiency and correctness in Spark SQL, which of the following queries would you use to create a new table with these extracted values? Choose the best option.
A
SELECT EXTRACT(YEAR FROM timestamp) as year, EXTRACT(MONTH FROM timestamp) as month, EXTRACT(DAY FROM timestamp) as day FROM dataset
B
SELECT year(timestamp) as year, month(timestamp) as month, day(timestamp) as day FROM dataset
C
SELECT FROM_UNIXTIME(timestamp, 'yyyy-MM-dd HH:mm:ss') as formatted_timestamp, EXTRACT(YEAR FROM formatted_timestamp) as year FROM dataset_
D
SELECT timestamp:year as year, timestamp:month as month, timestamp:day as day FROM dataset