Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A junior data engineer has ingested a JSON file into a table raw_table with the following schema:

cart_id STRING,
items ARRAY<item_id:STRING>

cart_id STRING,
items ARRAY<item_id:STRING>

The junior data engineer would like to unnest the items column in raw_table to result in a new table with the following schema:

cart_id STRING,
item_id STRING

cart_id STRING,
item_id STRING

Which of the following commands should the junior data engineer run to complete this task?

Exam-Like

Community

KKeng

Last updated: January 13, 2026 at 09:02

SELECT cart_id, filter(items) AS item_id FROM raw_table;

SELECT cart_id, flatten(items) AS item_id FROM raw_table;

SELECT cart_id, reduce(items) AS item_id FROM raw_table;

SELECT cart_id, explode(items) AS item_id FROM raw_table;

SELECT cart_id, slice(items) AS item_id FROM raw_table;

Explanation:

Explanation

The correct answer is D because:

explode() function is specifically designed to unnest array columns in Spark SQL/Databricks. It takes an array column as input and returns a new row for each element in the array, while duplicating the other column values.
How explode() works:
- For each row in raw_table, explode(items) creates multiple rows
- Each element from the items array becomes a separate row
- The cart_id is repeated for each item in the array
- This transforms the schema from cart_id STRING, items ARRAY<item_id:STRING> to cart_id STRING, item_id STRING
Why other options are incorrect:
- A. filter(): Used to filter elements from an array based on a condition, not to unnest arrays
- B. flatten(): Combines multiple arrays into a single array, but doesn't create separate rows
- C. reduce(): Aggregates array elements using a binary function, not for unnesting
- E. slice(): Returns a subset of an array, not for unnesting

Example transformation: If raw_table has:

cart_id: "cart1", items: ["item1", "item2", "item3"]
cart_id: "cart2", items: ["item4"]

cart_id: "cart1", items: ["item1", "item2", "item3"]
cart_id: "cart2", items: ["item4"]

After SELECT cart_id, explode(items) AS item_id FROM raw_table;:

cart_id: "cart1", item_id: "item1"
cart_id: "cart1", item_id: "item2"
cart_id: "cart1", item_id: "item3"
cart_id: "cart2", item_id: "item4"

cart_id: "cart1", item_id: "item1"
cart_id: "cart1", item_id: "item2"
cart_id: "cart1", item_id: "item3"
cart_id: "cart2", item_id: "item4"

This is a common pattern in data transformation where you need to flatten nested array structures into a tabular format for further processing or analysis.

Powered ByGPT-5.2

Comments

Loading comments...