Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column e_employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function. Which of the following code blocks successfully completes this task?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

SELECT store_id, employees, FILTER (employees, i -> i.years_exp > 5) AS exp_employees FROM stores;

SELECT store_id, employees, FILTER (exp_employees, years_exp > 5) AS exp_employees FROM stores;

SELECT store_id, employees, FILTER (employees, years_exp > 5) AS exp_employees FROM stores;

SELECT store_id, employees, CASE WHEN employees.years_exp > 5 THEN employees ELSE NULL END AS exp_employees FROM stores;

SELECT store_id, employees, FILTER (exp_employees, i -> i.years_exp > 5) AS exp_employees FROM stores;

Explanation:

Explanation

Option A is correct because it properly uses the FILTER higher-order function with the correct syntax:

FILTER(employees, i -> i.years_exp > 5) correctly applies the filter to the employees array column
The lambda expression i -> i.years_exp > 5 properly references each element in the array as i and accesses the years_exp field
The result is assigned to the new column exp_employees

Why other options are incorrect:

Option B: Incorrect syntax - FILTER(exp_employees, years_exp > 5) tries to filter a column that doesn't exist yet (exp_employees is the output column, not the input). Also missing the lambda expression.

Option C: Incorrect syntax - FILTER(employees, years_exp > 5) is missing the lambda expression needed to reference array elements.

Option D: Incorrect approach - Uses CASE statement which doesn't properly handle array filtering. CASE WHEN would need to be used with array functions like TRANSFORM or FILTER, not directly like this.

Option E: Incorrect - Similar to Option B, tries to filter exp_employees which doesn't exist yet as an input column.

Key concepts:

The FILTER higher-order function in Spark SQL filters an array using a lambda expression
The syntax is FILTER(array, lambda_expression) where the lambda expression returns a boolean
The lambda parameter (e.g., i) represents each element in the array
For struct elements in arrays, you need to access fields using dot notation (e.g., i.years_exp)

Powered ByGPT-5.2

Comments

Loading comments...