
Answer-first summary for fast verification
Answer: SELECT id, name, age, CASE WHEN age < 13 THEN 'Child' WHEN age < 20 THEN 'Teen' WHEN age < 65 THEN 'Adult' WHEN age >= 65 THEN 'Senior' END AS age_group FROM df, SELECT id, name, age, CASE WHEN age IS NULL THEN NULL WHEN age < 13 THEN 'Child' WHEN age < 20 THEN 'Teen' WHEN age < 65 THEN 'Adult' ELSE 'Senior' END AS age_group FROM df
Option A is incorrect because it uses an ELSE clause that will misclassify NULL values as 'Senior'. Option B is incorrect due to its syntax and logic errors in the CASE statement. Option C is incorrect because it uses IN clauses which do not correctly represent the age ranges. Option D is correct as it accurately categorizes ages into the specified groups without misclassifying NULL values. Option E is also correct because it explicitly handles NULL values by setting 'age_group' to NULL when 'age' is NULL, in addition to correctly categorizing the non-NULL ages. Therefore, the best options are D and E.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a data engineering project, you are tasked with categorizing individuals into specific age groups for a comprehensive analysis. The DataFrame 'df' contains columns 'id', 'name', and 'age'. The age groups are defined as follows: 'Child' (0-12 years), 'Teen' (13-19 years), 'Adult' (20-64 years), and 'Senior' (65+ years). The solution must accurately classify each individual into the correct age group and handle NULL values appropriately to avoid misclassification. Given these requirements, which of the following Spark SQL queries using the CASE/WHEN statement correctly assigns a new column 'age_group' based on the specified age ranges and ensures NULL values are handled correctly? Choose the two best options.
A
SELECT id, name, age, CASE WHEN age BETWEEN 0 AND 12 THEN 'Child' WHEN age BETWEEN 13 AND 19 THEN 'Teen' WHEN age >= 20 AND age <= 64 THEN 'Adult' ELSE 'Senior' END AS age_group FROM df
B
SELECT id, name, age, CASE age WHEN age <= 12 THEN 'Child' WHEN age <= 19 THEN 'Teen' WHEN age <= 64 THEN 'Adult' WHEN age >= 65 THEN 'Senior' END AS age_group FROM df
C
SELECT id, name, age, CASE WHEN age IN (0, 12) THEN 'Child' WHEN age IN (13, 19) THEN 'Teen' WHEN age IN (20, 64) THEN 'Adult' WHEN age >= 65 THEN 'Senior' END AS age_group FROM df
D
SELECT id, name, age, CASE WHEN age < 13 THEN 'Child' WHEN age < 20 THEN 'Teen' WHEN age < 65 THEN 'Adult' WHEN age >= 65 THEN 'Senior' END AS age_group FROM df
E
SELECT id, name, age, CASE WHEN age IS NULL THEN NULL WHEN age < 13 THEN 'Child' WHEN age < 20 THEN 'Teen' WHEN age < 65 THEN 'Adult' ELSE 'Senior' END AS age_group FROM df