
Answer-first summary for fast verification
Answer: SELECT * FROM dataset WHERE get_json_object(customer_preferences, '$.preferred_color') IS NULL
Option C is the correct choice because it accurately checks for the absence of the 'preferred_color' field within the JSON objects of the 'customer_preferences' column by utilizing the get_json_object function combined with the IS NULL condition. This approach is both efficient and syntactically correct in Spark SQL. Options A and D misuse the LIKE operator, which is inappropriate for JSON field validation and, in the case of Option D, actually selects rows where 'preferred_color' is present, contrary to the requirement. Option B attempts to use an EXCEPT clause, which is not supported in Spark SQL for this purpose and fails to correctly validate the absence of the specified JSON field.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a scenario where you are working with a large dataset in Azure Databricks, you encounter a 'customer_preferences' column that contains JSON objects detailing various customer preferences. Your task is to ensure data quality by validating that the 'preferred_color' field is not present in any of the JSON objects within the 'customer_preferences' column. Considering the need for accuracy, efficiency, and adherence to Spark SQL syntax, which of the following queries would you use to achieve this validation? Choose the best option from the four provided.
A
SELECT * FROM dataset WHERE customer_preferences NOT LIKE '%preferred_color%'
B
SELECT * FROM dataset EXCEPT SELECT * FROM dataset WHERE customer_preferences LIKE '%preferred_color%'
C
SELECT * FROM dataset WHERE get_json_object(customer_preferences, '$.preferred_color') IS NULL
D
SELECT * FROM dataset WHERE customer_preferences LIKE '%preferred_color%'
No comments yet.