
Ultimate access to all questions.
A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.
Which code block successfully completes this task?
A
SELECT store_id, employees,
FILTER (employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
SELECT store_id, employees,
FILTER (employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
B
SELECT store_id, employees,
FILTER (exp_employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
SELECT store_id, employees,
FILTER (exp_employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
C
SELECT store_id, employees,
FILTER (employees, years_exp > 5) AS exp_employees
FROM stores;
SELECT store_id, employees,
FILTER (employees, years_exp > 5) AS exp_employees
FROM stores;
D
SELECT store_id, employees,
CASE WHEN employees.years_exp > 5 THEN employees ELSE NULL END AS exp_employees
FROM stores;
SELECT store_id, employees,
CASE WHEN employees.years_exp > 5 THEN employees ELSE NULL END AS exp_employees
FROM stores;
Explanation:
Option A is correct because:
FILTER higher-order function to filter array elementsi -> i.years_exp > 5 correctly references each array element as i and accesses its years_exp fieldemployees array column and creates a new array exp_employees containing only employees with more than 5 years of experienceOption B is incorrect because:
exp_employees in the FILTER function before it's defined, creating a circular reference errorexp_employees doesn't exist yet when trying to filter itOption C is incorrect because:
years_exp > 5 without referencing the array elementi) to access its fieldsOption D is incorrect because:
CASE statement which doesn't work correctly with array elementsemployees.years_exp is invalid syntax since employees is an array, not a scalar value with a years_exp fieldThe FILTER function in Spark SQL is specifically designed for filtering array elements based on a condition, making it the appropriate choice for this task.