Ultimate access to all questions.
In a Databricks environment, you are working with a dataset that includes a 'customer_data' column. This column contains JSON objects with customer information, including an 'address' field that is a nested JSON object with 'city' and 'zipcode' among other details. Your task is to write a Spark SQL query that extracts the 'city' and 'zipcode' from the 'address' field and creates a new table with these extracted values. Consider the following constraints: the solution must be efficient, scalable, and must correctly handle nested JSON structures. Which of the following queries achieves this goal? Choose the best option from the four provided.
Explanation:
Option D is the correct answer because it effectively uses the JSON_EXTRACT function to directly access and extract the 'city' and 'zipcode' from the nested 'address' JSON object within the 'customer_data' column. This method is efficient and scalable, making it suitable for large datasets. Option A incorrectly attempts to use dot notation to access nested JSON fields, which is not supported in Spark SQL for nested JSON objects. Option B misuses the JSON_TABLE function with incorrect syntax and fails to properly extract the specified fields. Option C incorrectly applies square bracket notation, which is not the correct syntax for JSON extraction in Spark SQL.