
Explanation:
The correct answer is D because the EXPLODE function is specifically designed to unnest array columns in Spark SQL.
cart_id = "cart1" and items = ["itemA", "itemB", "itemC"], EXPLODE(items) would generate:
("cart1", "itemA")("cart1", "itemB")("cart1", "itemC")EXPLODE is the standard Spark SQL function for converting array elements into individual rows, which is exactly what's needed for this data transformation task.
Ultimate access to all questions.
No comments yet.
Question 20
A junior data engineer has ingested a JSON file into a table raw_table with the following schema:
cart_id STRING,
items ARRAY<item_id:STRING>
cart_id STRING,
items ARRAY<item_id:STRING>
The junior data engineer would like to unnest the items column in raw_table to result in a new table with the following schema:
cart_id STRING,
item_id STRING
cart_id STRING,
item_id STRING
Which of the following commands should the junior data engineer run to complete this task?
A
SELECT cart_id, filter(items) AS item_id FROM raw_table;
B
SELECT cart_id, flatten(items) AS item_id FROM raw_table;
C
SELECT cart_id, reduce(items) AS item_id FROM raw_table;
D
SELECT cart_id, explode(items) AS item_id FROM raw_table;
E
SELECT cart_id, slice(items) AS item_id FROM raw_table;