
Ultimate access to all questions.
In a scenario where you are working with a large dataset in Azure Databricks that contains 'product_name' and 'category' columns, your task is to ensure data integrity by validating that each 'product_name' is associated with only one unique 'category' value. Considering the need for efficiency and accuracy in a production environment, which of the following Spark SQL queries would you use to identify any 'product_name' that violates this uniqueness constraint by being associated with more than one 'category'? Choose the best option._
A
SELECT product_name, COUNT(DISTINCT category) as category_count FROM dataset GROUP BY product_name HAVING category_count > 1
B
SELECT product_name, MAX(category) as max_category FROM dataset GROUP BY product_name_
C
SELECT product_name, category FROM dataset GROUP BY product_name, category HAVING COUNT() > 1
D
SELECT product_name, category FROM dataset WHERE product_name IN (SELECT product_name FROM dataset GROUP BY product_name HAVING COUNT(DISTINCT category) > 1)