
Answer-first summary for fast verification
Answer: SELECT get_json_object(customer_data, '$.address.city') AS city, get_json_object(customer_data, '$.address.zipcode') AS zipcode FROM customers
Option D is correct because get_json_object is the proper Spark SQL function for extracting values from a JSON string column in Databricks. It allows you to specify a JSONPath expression (e.g., $.address.city) to retrieve nested fields. Option A is invalid because dot notation only works for struct columns, not for JSON strings. Option B is invalid because JSON_TABLE is not supported in Spark SQL — it’s a MySQL/Oracle function. Option C is invalid because square bracket notation is not valid for JSON strings in Spark SQL — it works for maps or arrays, not raw JSON text.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a Databricks environment, you are working with a Delta table named customers that contains a column called customer_data. This column is of type STRING and contains JSON text with customer information. Inside each JSON object, there is an address field, which is itself a nested JSON object containing city and zipcode fields.
You need to write a Spark SQL query to extract the city and zipcode values from the address object and return them as separate columns. The solution must be efficient, scalable, and use correct Spark SQL syntax for working with JSON strings.
Which of the following queries correctly achieves this?
A
SELECT customer_data.address.city, customer_data.address.zipcode FROM customers
B
SELECT city, zipcode FROM customers CROSS JOIN JSON_TABLE(customer_data, '$.address')
C
SELECT customer_data['address']['city'], customer_data['address']['zipcode'] FROM customers
D
SELECT get_json_object(customer_data, '.address.zipcode') AS zipcode FROM customers
No comments yet.