
Ultimate access to all questions.
As a Data Engineer at a retail company utilizing Databricks for data processing, you are tasked with optimizing a Spark SQL query to calculate the total cost for each product in a DataFrame 'df' with columns 'id', 'product', 'quantity', and 'price'. The company mandates applying a 10% discount for 'Electronics' and a 20% discount for 'Groceries'. The solution must not only be cost-effective and scalable for large datasets but also adhere to the company's policy of minimizing computational overhead. Considering these requirements, which of the following queries would you choose to implement? (Choose two correct options.)
A
SELECT product, quantity, price, (quantity * price) AS total_cost FROM df*_
B
SELECT product, quantity, price, (quantity * price) * CASE WHEN product = 'Electronics' THEN 0.9 WHEN product = 'Groceries' THEN 0.8 ELSE 1 END AS total_cost FROM df_
C
SELECT product, quantity, price, (quantity * price) * CASE WHEN product IN ('Electronics', 'Groceries') THEN 0.8 ELSE 1 END AS total_cost FROM df_
D
SELECT product, quantity, price, SUM(quantity * price) * CASE WHEN product = 'Electronics' THEN 0.9 WHEN product = 'Groceries' THEN 0.8 ELSE 1 END AS total_cost FROM df GROUP BY product_
E
SELECT product, quantity, price, (quantity * price) * CASE WHEN product = 'Electronics' THEN 0.9 WHEN product = 'Groceries' THEN 0.8 ELSE 1 END AS total_cost FROM df UNION ALL SELECT product, quantity, price, (quantity * price) AS total_cost FROM df WHERE product NOT IN ('Electronics', 'Groceries')*