Databricks Certified Data Engineer - Associate

Ultimate access to all questions.

In a scenario where you are working with a dataset in Azure Databricks that contains a 'customer_orders' column with JSON arrays representing customer order data, you are tasked with analyzing the data to understand customer purchasing behavior. Specifically, you need to count the number of orders for each customer and store this information in a new table for further analysis. The solution must efficiently handle large datasets and ensure accurate counts. Given the following options, which Spark SQL query would you use to achieve this goal? Choose the best option that correctly counts the number of orders for each customer by analyzing the 'customer_orders' JSON array. (Choose one option)

Simulated

SELECT customer_id, COUNT(customer_orders) as order_count FROM dataset GROUP BY customer_id

23.6%

SELECT customer_id, SIZE(customer_orders) as order_count FROM dataset

Loading comments...