
Answer-first summary for fast verification
Answer: SELECT date, SUM(value) AS electronics, SUM(value) AS groceries, SUM(value) AS clothing FROM df PIVOT (SUM(value) FOR category IN ('electronics', 'groceries', 'clothing')), SELECT date, SUM(value) AS electronics, SUM(value) AS groceries, SUM(value) AS clothing FROM df GROUP BY date, category
Option B is the correct answer as it efficiently uses the PIVOT clause to transform the data from a long format to a wide format, creating separate columns for each specified category and summing the values for each category, which is optimized for performance with large datasets. Option E also achieves the desired outcome by grouping the data by date and category and summing the values, but it does not use the PIVOT clause, making it less efficient for large datasets. Option A is incorrect as it only calculates the total value without transforming the data into the required wide format. Option C is incorrect as it does not perform any aggregation operation on the values. Option D is incorrect as it uses the MAX function instead of SUM, which would not provide the correct aggregated result for each category.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a data engineering project, you are tasked with transforming a large DataFrame 'df' with the schema (date: string, category: string, value: double) from a long format to a wide format. The wide format should have each category ('electronics', 'groceries', 'clothing') as its own column, with the sum of 'value' for each category. The solution must be optimized for performance to handle large datasets efficiently and ensure data accuracy. Additionally, the solution should minimize resource usage to keep costs low. Which of the following Spark SQL queries best accomplishes this task? Choose the best option.
A
SELECT date, SUM(value) AS total_value FROM df GROUP BY date
B
SELECT date, SUM(value) AS electronics, SUM(value) AS groceries, SUM(value) AS clothing FROM df PIVOT (SUM(value) FOR category IN ('electronics', 'groceries', 'clothing'))
C
SELECT date, electronics, groceries, clothing FROM df PIVOT (value FOR category IN ('electronics', 'groceries', 'clothing'))
D
SELECT date, MAX(value) AS electronics, MAX(value) AS groceries, MAX(value) AS clothing FROM df PIVOT (value FOR category IN ('electronics', 'groceries', 'clothing'))
E
SELECT date, SUM(value) AS electronics, SUM(value) AS groceries, SUM(value) AS clothing FROM df GROUP BY date, category