
Answer-first summary for fast verification
Answer: SELECT customer_id, SIZE(customer_orders) as order_count FROM dataset
Option B is the correct answer because it accurately uses the SIZE function to count the number of elements in each 'customer_orders' JSON array, directly providing the count of orders per customer. Option A is incorrect as COUNT applied to a JSON array does not return the count of elements within the array. Option C counts rows with non-null 'customer_orders', not the number of orders per customer. Option D counts distinct JSON arrays, which does not reflect the number of orders per customer.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a scenario where you are working with a dataset in Azure Databricks that contains a 'customer_orders' column with JSON arrays representing customer order data, you are tasked with analyzing the data to understand customer purchasing behavior. Specifically, you need to count the number of orders for each customer and store this information in a new table for further analysis. The solution must efficiently handle large datasets and ensure accurate counts. Given the following options, which Spark SQL query would you use to achieve this goal? Choose the best option that correctly counts the number of orders for each customer by analyzing the 'customer_orders' JSON array. (Choose one option)
A
SELECT customer_id, COUNT(customer_orders) as order_count FROM dataset GROUP BY customer_id
B
SELECT customer_id, SIZE(customer_orders) as order_count FROM dataset
C
SELECT customer_id, COUNT(*) as order_count FROM dataset WHERE customer_orders IS NOT NULL GROUP BY customer_id
D
SELECT customer_id, COUNT(DISTINCT customer_orders) as order_count FROM dataset GROUP BY customer_id
No comments yet.