Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Explanation:

Option C is the correct choice because it accurately checks for the absence of the 'preferred_color' field within the JSON objects of the 'customer_preferences' column by utilizing the get_json_object function combined with the IS NULL condition. This approach is both efficient and syntactically correct in Spark SQL. Options A and D misuse the LIKE operator, which is inappropriate for JSON field validation and, in the case of Option D, actually selects rows where 'preferred_color' is present, contrary to the requirement. Option B attempts to use an EXCEPT clause, which is not supported in Spark SQL for this purpose and fails to correctly validate the absence of the specified JSON field.

Explanation:

Comments (0)

No comments yet.

In a scenario where you are working with a large dataset in Azure Databricks, you encounter a 'customer_preferences' column that contains JSON objects detailing various customer preferences. Your task is to ensure data quality by validating that the 'preferred_color' field is not present in any of the JSON objects within the 'customer_preferences' column. Considering the need for accuracy, efficiency, and adherence to Spark SQL syntax, which of the following queries would you use to achieve this validation? Choose the best option from the four provided.

Simulated

Last updated: March 30, 2026 at 14:04

SELECT * FROM dataset WHERE customer_preferences NOT LIKE '%preferred_color%'

21.7%

SELECT * FROM dataset EXCEPT SELECT * FROM dataset WHERE customer_preferences LIKE '%preferred_color%'

15.5%

SELECT * FROM dataset WHERE get_json_object(customer_preferences, '$.preferred_color') IS NULL

53.1%

SELECT * FROM dataset WHERE customer_preferences LIKE '%preferred_color%'

9.7%