
Answer-first summary for fast verification
Answer: WITH filtered_data AS (SELECT * FROM df WHERE age IS NOT NULL) SELECT COUNT(*) FROM filtered_data
The correct answer is A, as it accurately creates a CTE named 'filtered_data' that filters out rows where 'age' is NULL using the 'IS NOT NULL' condition, which is the correct way to check for NULL values in SQL. It then counts the number of rows in the CTE. Option B and C incorrectly use 'EXCEPT' which is not valid syntax for filtering columns in this context. Option D incorrectly checks for the string 'NULL' instead of the NULL value, which would not correctly filter out NULL ages.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a scenario where you are working with a large dataset in Azure Databricks, you have a DataFrame 'df' with columns 'id', 'name', and 'age'. Your task is to optimize the data processing by filtering out rows where 'age' is NULL and then counting the number of remaining rows. Considering the importance of performance and correctness in a production environment, which of the following Spark SQL queries correctly creates a Common Table Expression (CTE) named 'filtered_data' that filters out rows where 'age' is NULL and then uses the CTE to count the number of rows? Choose the single best option.
A
WITH filtered_data AS (SELECT * FROM df WHERE age IS NOT NULL) SELECT COUNT(*) FROM filtered_data
B
WITH filtered_data AS (SELECT * EXCEPT (age) FROM df WHERE age IS NOT NULL) SELECT COUNT(*) FROM filtered_data
C
WITH filtered_data AS (SELECT * FROM df EXCEPT (age) WHERE age IS NOT NULL) SELECT COUNT(*) FROM filtered_data
D
WITH filtered_data AS (SELECT * FROM df WHERE age != 'NULL') SELECT COUNT(*) FROM filtered_data
No comments yet.