As a Data Engineer at a retail company utilizing Databricks for data processing, you are tasked with optimizing a Spark SQL query to calculate the total cost for each product in a DataFrame 'df' with columns 'id', 'product', 'quantity', and 'price'. The company mandates applying a 10% discount for 'Electronics' and a 20% discount for 'Groceries'. The solution must not only be cost-effective and scalable for large datasets but also adhere to the company's policy of minimizing computational overhead. Considering these requirements, which of the following queries would you choose to implement? (Choose two correct options.)

Simulated

SELECT product, quantity, price, (quantity * price) AS total_cost FROM df*_

0.8%

SELECT product, quantity, price, (quantity * price) * CASE WHEN product = 'Electronics' THEN 0.9 WHEN product = 'Groceries' THEN 0.8 ELSE 1 END AS total_cost FROM df_

60.0%

SELECT product, quantity, price, (quantity * price) * CASE WHEN product IN ('Electronics', 'Groceries') THEN 0.8 ELSE 1 END AS total_cost FROM df_

5.6%

SELECT product, quantity, price, SUM(quantity * price) * CASE WHEN product = 'Electronics' THEN 0.9 WHEN product = 'Groceries' THEN 0.8 ELSE 1 END AS total_cost FROM df GROUP BY product_

25.0%

SELECT product, quantity, price, (quantity * price) * CASE WHEN product = 'Electronics' THEN 0.9 WHEN product = 'Groceries' THEN 0.8 ELSE 1 END AS total_cost FROM df UNION ALL SELECT product, quantity, price, (quantity * price) AS total_cost FROM df WHERE product NOT IN ('Electronics', 'Groceries')*

8.6%

Databricks Certified Data Engineer - Associate

Comments

Get started today