
Ultimate access to all questions.
In a data engineering project, you are working with a DataFrame named 'df_sales' that contains sales data. The DataFrame includes columns 'date' (in 'yyyy-MM-dd' format), 'product_id', 'quantity_sold', and 'revenue'. Your task is to transform the 'date' column into a struct type with separate fields for 'year', 'month', and 'day' to facilitate easier date-based analysis. Considering the need for efficiency and correctness in Spark SQL, which of the following queries best accomplishes this task? Choose the correct option.
A
SELECT date,
product_id,
quantity_sold,
revenue,
explode(split(date, '-')) AS (year, month, day)
FROM df_sales
SELECT date,
product_id,
quantity_sold,
revenue,
explode(split(date, '-')) AS (year, month, day)
FROM df_sales
B
SELECT date,
product_id,
quantity_sold,
revenue,
explode(split(date, '-')) AS date_parts
FROM df_sales
SELECT date,
product_id,
quantity_sold,
revenue,
explode(split(date, '-')) AS date_parts
FROM df_sales
C
SELECT date,
product_id,
quantity_sold,
revenue,
split(date, '-') AS (year, month, day)
FROM df_sales
SELECT date,
product_id,
quantity_sold,
revenue,
split(date, '-') AS (year, month, day)
FROM df_sales
D
SELECT date,
product_id,
quantity_sold,
revenue,
to_date(date, 'yyyy-MM-dd') AS parsed_date
FROM df_sales
SELECT date,
product_id,
quantity_sold,
revenue,
to_date(date, 'yyyy-MM-dd') AS parsed_date
FROM df_sales
E
SELECT date,
product_id,
quantity_sold,
revenue,
struct(
split(date, '-')[0] AS year,
split(date, '-')[1] AS month,
split(date, '-')[2] AS day
) AS date_struct
FROM df_sales
SELECT date,
product_id,
quantity_sold,
revenue,
struct(
split(date, '-')[0] AS year,
split(date, '-')[1] AS month,
split(date, '-')[2] AS day
) AS date_struct
FROM df_sales