
Databricks Certified Data Engineer - Associate
Get started today
Ultimate access to all questions.
In a Databricks environment, you are working with a dataset that includes a 'user_activity' column. This column contains JSON objects with various user activity data, including a 'login_time' field formatted as 'yyyy-MM-dd HH:mm:ss'. Your task is to ensure data quality by validating that the 'login_time' field is not null for any row in the dataset. Considering the nuances of querying semi-structured JSON data in Databricks SQL and the need for efficient data processing, which of the following Spark SQL queries would you use to achieve this validation? Choose the best option from the following:
In a Databricks environment, you are working with a dataset that includes a 'user_activity' column. This column contains JSON objects with various user activity data, including a 'login_time' field formatted as 'yyyy-MM-dd HH:mm:ss'. Your task is to ensure data quality by validating that the 'login_time' field is not null for any row in the dataset. Considering the nuances of querying semi-structured JSON data in Databricks SQL and the need for efficient data processing, which of the following Spark SQL queries would you use to achieve this validation? Choose the best option from the following:
Explanation:
Option C is correct because it utilizes the Databricks SQL syntax for extracting fields from JSON string columns, which is