
Answer-first summary for fast verification
Answer: SELECT CAST(transaction_date AS TIMESTAMP) as ts, EXTRACT(YEAR FROM ts) as year, EXTRACT(MONTH FROM ts) as month, EXTRACT(DAY FROM ts) as day FROM dataset
Option C is the correct answer because it accurately casts the 'transaction_date' column to a timestamp and then uses the EXTRACT function to retrieve the year, month, and day from the timestamp. This approach is both efficient and correct for large datasets. Option A fails because the EXTRACT function is applied directly to the 'transaction_date' column without first casting it to a timestamp. Option B incorrectly applies the YEAR, MONTH, and DAY functions to a string format, which will not yield the desired results. Option D is incorrect because it uses FROM_UNIXTIME to format the timestamp unnecessarily and only extracts the year, omitting the month and day.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a Databricks environment, you are working with a large dataset that includes a 'transaction_date' column formatted as 'yyyy-MM-dd'. Your task is to analyze transactions by year, month, and day. To achieve this, you need to cast the 'transaction_date' column to a timestamp and then extract the year, month, and day from the resulting timestamp. Considering the need for accuracy and performance in processing large datasets, which of the following Spark SQL queries correctly accomplishes this task? Choose the best option from the four provided.
A
SELECT CAST(transaction_date AS TIMESTAMP), EXTRACT(YEAR FROM transaction_date), EXTRACT(MONTH FROM transaction_date), EXTRACT(DAY FROM transaction_date) FROM dataset
B
SELECT transaction_date, YEAR(transaction_date), MONTH(transaction_date), DAY(transaction_date) FROM dataset
C
SELECT CAST(transaction_date AS TIMESTAMP) as ts, EXTRACT(YEAR FROM ts) as year, EXTRACT(MONTH FROM ts) as month, EXTRACT(DAY FROM ts) as day FROM dataset
D
SELECT FROM_UNIXTIME(CAST(transaction_date AS TIMESTAMP), 'yyyy-MM-dd') as formatted_date, EXTRACT(YEAR FROM formatted_date) as year FROM dataset
No comments yet.