
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.
Which code block successfully completes this task?
A
SELECT store_id, employees,
FILTER (employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
SELECT store_id, employees,
FILTER (employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
B
SELECT store_id, employees,
FILTER (exp_employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
SELECT store_id, employees,
FILTER (exp_employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
C
SELECT store_id, employees,
FILTER (employees, years_exp > 5) AS exp_employees
FROM stores;
SELECT store_id, employees,
FILTER (employees, years_exp > 5) AS exp_employees
FROM stores;
D
SELECT store_id, employees,
CASE WHEN employees.years_exp > 5 THEN employees ELSE NULL END AS exp_employees
FROM stores;
SELECT store_id, employees,
CASE WHEN employees.years_exp > 5 THEN employees ELSE NULL END AS exp_employees
FROM stores;
Explanation:
Option A is correct because:
FILTER higher-order function to iterate over the employees array columni -> i.years_exp > 5 correctly references each array element as i and accesses the years_exp fieldexp_employees containing only employees with more than 5 years of experienceOption B is incorrect because:
exp_employees in the FILTER condition before it's defined, which is invalid syntaxexp_employees doesn't exist yet when the FILTER function tries to use itOption C is incorrect because:
years_exp > 5 without referencing the array element variableFILTER(employees, years_exp > 5) would try to filter on a column named years_exp rather than the field within each array elementOption D is incorrect because:
CASE statement which doesn't iterate over array elementsemployees.years_exp would try to access a field on the array itself, not on individual elementsThe FILTER function in SQL (specifically in Databricks/Spark SQL) is designed for exactly this type of operation - applying a predicate to each element of an array and returning a new array containing only the elements that satisfy the condition.