Databricks Certified Data Engineer - Associate

Ultimate access to all questions.

In a scenario where you are analyzing sales data stored in a DataFrame 'df' with columns 'id', 'product', and 'quantity', you are tasked with identifying two key metrics: the number of rows where 'quantity' is NULL (indicating missing data) and the number of unique products that have non-NULL quantities (to understand product diversity). Considering the need for accuracy and efficiency in your query, which of the following Spark SQL queries correctly accomplishes this task? Choose the best option from the four provided.

Simulated

SELECT COUNT(*) - COUNT_IF(quantity IS NULL, TRUE), COUNT(DISTINCT product) FROM df

12.0%

SELECT COUNT_IF(quantity IS NULL), COUNT(DISTINCT product) FROM df

Loading comments...