LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Data Engineer - Associate

Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.


In a scenario where you are analyzing sales data stored in a DataFrame 'df' with columns 'id', 'product', and 'quantity', you are tasked with identifying two key metrics: the number of rows where 'quantity' is NULL (indicating missing data) and the number of unique products that have non-NULL quantities (to understand product diversity). Considering the need for accuracy and efficiency in your query, which of the following Spark SQL queries correctly accomplishes this task? Choose the best option from the four provided.

Simulated



Explanation:

The correct answer is B. This option accurately uses the COUNT_IF function to count the number of rows where 'quantity' is NULL, directly addressing the first part of the task. For the second part, it employs COUNT(DISTINCT product) to count the number of unique products with non-NULL quantities. This approach is efficient and directly queries the DataFrame without unnecessary filtering or grouping, which could complicate or inaccurately reflect the desired metrics. Options A, C, and D either miscalculate the counts or introduce unnecessary operations that do not align with the task's requirements.

Powered ByGPT-5