
Ultimate access to all questions.
In a scenario where you are analyzing sales data stored in a DataFrame 'df' with columns 'id', 'product', and 'quantity', you are tasked with identifying two key metrics: the number of rows where 'quantity' is NULL (indicating missing data) and the number of unique products that have non-NULL quantities (to understand product diversity). Considering the need for accuracy and efficiency in your query, which of the following Spark SQL queries correctly accomplishes this task? Choose the best option from the four provided.
A
SELECT COUNT() - COUNT_IF(quantity IS NULL, TRUE), COUNT(DISTINCT product) FROM df_
B
SELECT COUNT_IF(quantity IS NULL), COUNT(DISTINCT product) FROM df_
C
SELECT COUNT_IF(quantity IS NULL), COUNT(DISTINCT product) FROM df WHERE quantity IS NOT NULL_
D
SELECT COUNT_IF(quantity IS NULL), COUNT(DISTINCT product) FROM df GROUP BY product_