
Ultimate access to all questions.
In a data engineering project using Databricks, you are working with a DataFrame 'df' that contains customer information with columns 'id', 'name', and 'age'. The project requires you to analyze the data quality by identifying the number of rows with NULL values in any column and the number of rows with complete data (non-NULL values in all columns). Considering the importance of accurate data quality metrics for downstream processing, which of the following Spark SQL queries would you use to efficiently compute these metrics? Choose the best option that provides both counts accurately.
A
SELECT COUNT() - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL), COUNT() FROM df_
B
SELECT COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL), COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL) FROM df
C
SELECT COUNT() - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL), COUNT_IF(id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL) FROM df
D
SELECT COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL), COUNT() - COUNT_IF(id IS NULL OR name IS NULL OR age IS NULL) FROM df