
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.
Which of the following code blocks successfully completes this task?
A
SELECT
store_id,
employees,
FILTER (employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
SELECT
store_id,
employees,
FILTER (employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
B
SELECT
store_id,
employees,
FILTER (exp_employees, years_exp > 5) AS exp_employees
FROM stores;
SELECT
store_id,
employees,
FILTER (exp_employees, years_exp > 5) AS exp_employees
FROM stores;
C
SELECT
store_id,
employees,
FILTER (employees, years_exp > 5) AS exp_employees
FROM stores;
SELECT
store_id,
employees,
FILTER (employees, years_exp > 5) AS exp_employees
FROM stores;
D
SELECT
store_id,
employees,
CASE WHEN employees.years_exp > 5 THEN employees
ELSE NULL
END AS exp_employees
FROM stores;
SELECT
store_id,
employees,
CASE WHEN employees.years_exp > 5 THEN employees
ELSE NULL
END AS exp_employees
FROM stores;
E
SELECT
store_id,
employees,
FILTER (exp_employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
SELECT
store_id,
employees,
FILTER (exp_employees, i -> i.years_exp > 5) AS exp_employees
FROM stores;
Explanation:
Option A is correct because it properly uses the FILTER higher-order function with the correct syntax:
FILTER(employees, i -> i.years_exp > 5) - This correctly:
employees array column as inputi -> i.years_exp > 5 where i represents each element in the arrayyears_exp field of each employee object using i.years_expWhy the other options are incorrect:
exp_employees which doesn't exist yet (it's the column being created). Also, it doesn't use proper lambda syntax to access the years_exp field.years_exp > 5 would try to reference a column named years_exp which doesn't exist at the table level.CASE statement which would evaluate the entire array, not individual elements within the array. This would not filter individual employees within the array.exp_employees which doesn't exist yet (it's the column being created), similar to Option B.Key Concepts:
FILTER higher-order function in Spark SQL takes an array and a lambda function that returns a boolean for each element.element -> expression where element represents each item in the array.i.years_exp).