
Answer-first summary for fast verification
Answer: `SELECT transaction_id, payload.date FROM raw_table;`
## Explanation The correct answer is **B** because: - The `payload` field is an array of structs with fields including `date` - In Spark SQL, when you have an array of structs, you can directly access struct fields using dot notation (`payload.date`) - This will return an array of dates for each transaction_id **Why the other options are incorrect:** - **Option A**: `explode(payload)` would create multiple rows for each element in the array, which is not what's needed here. The requirement is to extract just the date field from the payload array. - **Option C**: `date` alone would not work because the date field is nested inside the payload array struct, not at the top level of the table schema. **Key Concept:** When working with arrays of structs in Spark SQL, you can use dot notation to access struct fields directly, which returns an array of the specified field values. This is more efficient than using `explode()` when you only need to extract specific fields from the struct elements.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
Question 21
A data engineer has ingested a JSON file into a table raw_table with the following schema:
transaction_id STRING,
payload ARRAY<customer_id:STRING, date:TIMESTAMP, store_id:STRING>
transaction_id STRING,
payload ARRAY<customer_id:STRING, date:TIMESTAMP, store_id:STRING>
The data engineer wants to efficiently extract the date of each transaction into a table with the following schema:
transaction_id STRING,
date TIMESTAMP
transaction_id STRING,
date TIMESTAMP
Which of the following commands should the data engineer run to complete this task?
A
SELECT transaction_id, explode(payload) FROM raw_table;
B
SELECT transaction_id, payload.date FROM raw_table;
C
SELECT transaction_id, date FROM raw_table;
D
SELECT transaction_id, payload[date] FROM raw_table;
E
SELECT transaction_id, date from payload FROM raw_table;