
Answer-first summary for fast verification
Answer: ```sq SELECT COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL), COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL) FROM df ```
Option **B** is correct because it directly calculates both required metrics: the number of rows with **any NULL values** using `COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL)`, and the number of rows with **no NULL values** using `COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL)`. This aligns exactly with the question’s requirement, whereas Option C only computes complete rows and does not explicitly count rows containing NULLs.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a data engineering project using Databricks, you are working with a DataFrame 'df' that contains customer information with columns 'id', 'name', and 'age'. The project requires you to analyze the data quality by identifying the number of rows with NULL values in any column and the number of rows with complete data (non-NULL values in all columns). Considering the importance of accurate data quality metrics for downstream processing, which of the following Spark SQL queries would you use to efficiently compute these metrics? Choose the best option that provides both counts accurately.
A
SELECT
COUNT(*) - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT(*)
FROM df
SELECT
COUNT(*) - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT(*)
FROM df
B
SELECT
COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL)
FROM df
SELECT
COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL)
FROM df
C
SELECT
COUNT(*) - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL)
FROM df
SELECT
COUNT(*) - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL)
FROM df
D
SELECT
COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT(*) - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL)
FROM df
SELECT
COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL),
COUNT(*) - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL)
FROM df