
Ultimate access to all questions.
Question 21
A data engineer has ingested a JSON file into a table raw_table with the following schema:
transaction_id STRING,
payload ARRAY<customer_id:STRING, date:TIMESTAMP, store_id:STRING>
transaction_id STRING,
payload ARRAY<customer_id:STRING, date:TIMESTAMP, store_id:STRING>
The data engineer wants to efficiently extract the date of each transaction into a table with the following schema:
transaction_id STRING,
date TIMESTAMP
transaction_id STRING,
date TIMESTAMP
Which of the following commands should the data engineer run to complete this task?_
Explanation:
The correct answer is B because:
payload field is an array of structs with fields including datepayload.date)Why the other options are incorrect:
Option A: explode(payload) would create multiple rows for each element in the array, which is not what's needed here. The requirement is to extract just the date field from the payload array.
Option C: date alone would not work because the date field is nested inside the payload array struct, not at the top level of the table schema.
Key Concept:
When working with arrays of structs in Spark SQL, you can use dot notation to access struct fields directly, which returns an array of the specified field values. This is more efficient than using explode() when you only need to extract specific fields from the struct elements._