
Answer-first summary for fast verification
Answer: `SELECT cart_id, explode(items) AS item_id FROM raw_table;`
## Explanation The correct answer is **D** because: 1. **`explode()` function** is specifically designed to unnest array columns in Spark SQL/Databricks. It takes an array column as input and returns a new row for each element in the array, while duplicating the other column values. 2. **How `explode()` works**: - For each row in `raw_table`, `explode(items)` creates multiple rows - Each element from the `items` array becomes a separate row - The `cart_id` is repeated for each item in the array - This transforms the schema from `cart_id STRING, items ARRAY<item_id:STRING>` to `cart_id STRING, item_id STRING` 3. **Why other options are incorrect**: - **A. `filter()`**: Used to filter elements from an array based on a condition, not to unnest arrays - **B. `flatten()`**: Combines multiple arrays into a single array, but doesn't create separate rows - **C. `reduce()`**: Aggregates array elements using a binary function, not for unnesting - **E. `slice()`**: Returns a subset of an array, not for unnesting 4. **Example transformation**: If `raw_table` has: ``` cart_id: "cart1", items: ["item1", "item2", "item3"] cart_id: "cart2", items: ["item4"] ``` After `SELECT cart_id, explode(items) AS item_id FROM raw_table;`: ``` cart_id: "cart1", item_id: "item1" cart_id: "cart1", item_id: "item2" cart_id: "cart1", item_id: "item3" cart_id: "cart2", item_id: "item4" ``` This is a common pattern in data transformation where you need to flatten nested array structures into a tabular format for further processing or analysis.
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A junior data engineer has ingested a JSON file into a table raw_table with the following schema:
cart_id STRING,
items ARRAY<item_id:STRING>
cart_id STRING,
items ARRAY<item_id:STRING>
The junior data engineer would like to unnest the items column in raw_table to result in a new table with the following schema:
cart_id STRING,
item_id STRING
cart_id STRING,
item_id STRING
Which of the following commands should the junior data engineer run to complete this task?
A
SELECT cart_id, filter(items) AS item_id FROM raw_table;
B
SELECT cart_id, flatten(items) AS item_id FROM raw_table;
C
SELECT cart_id, reduce(items) AS item_id FROM raw_table;
D
SELECT cart_id, explode(items) AS item_id FROM raw_table;
E
SELECT cart_id, slice(items) AS item_id FROM raw_table;