
Answer-first summary for fast verification
Answer: ```sql SELECT user_info.first_name, user_info.last_name, user_info.email FROM user_events; ```
**Option A (Correct)** This is the **best and most efficient** choice. In Databricks Spark SQL, when a column contains JSON-formatted strings, you can directly use the **colon + dot notation** (often referred to as **dot syntax** in exam contexts): `column_name.field_name`. It is highly optimized by the Catalyst optimizer, avoids repeated JSON parsing, and performs best on large datasets. This shorthand is the recommended pattern in Databricks documentation for simple nested field extraction from JSON strings. It also works seamlessly with Delta tables and materialized views. **Option B (Incorrect - less efficient)** Using `get_json_object()` multiple times forces Spark to parse the JSON string **separately for each call**. On a billion-row table, this leads to significant performance overhead (repeated parsing). It is acceptable for one-off extractions but **not recommended** for production reporting queries where performance and cost matter. **Option C (Incorrect - wrong syntax for STRING JSON)** Bracket notation `user_info['first_name']` works when the column is already a **MAP** or **STRUCT** type, or when using the newer `VARIANT` type. For a plain **STRING** column containing JSON, this syntax will fail or return `NULL`. Bracket notation is **case-sensitive** and not the correct approach here. **Option D (Incorrect - overkill and not optimal)** `from_json()` is powerful when you need to parse the entire JSON into a structured `STRUCT` with a schema. However, defining the schema inline and then using `.*` is unnecessary when you only need three fields. It adds parsing overhead and is better suited for complex transformations or when you want to enforce types early in the pipeline (e.g., during ingestion). **Option E (Incorrect - invalid syntax)** The double-colon `::` syntax is used in PostgreSQL or for type casting in some databases, but it is **not valid** in Spark SQL / Databricks SQL for accessing JSON fields.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are a data engineer working in Azure Databricks on a large-scale Delta table named user_events (partitioned by event_date). The table contains a STRING column user_info that stores semi-structured JSON objects with the following structure (example row):
{
"first_name": "Alice",
"last_name": "Smith",
"email": "alice.smith@example.com",
"preferences": {
"theme": "dark",
"notifications": true
}
}
{
"first_name": "Alice",
"last_name": "Smith",
"email": "alice.smith@example.com",
"preferences": {
"theme": "dark",
"notifications": true
}
}
The table has billions of rows, and you need to create a materialized view for a reporting dashboard that frequently queries the user's first_name, last_name, and email. The solution must prioritize query performance, scalability, and correct Spark SQL syntax when the JSON is stored as a plain STRING column (not yet parsed into a STRUCT).
Which of the following queries is the most efficient and recommended approach to extract these fields using dot notation (or equivalent)?
A
SELECT
user_info.first_name,
user_info.last_name,
user_info.email
FROM user_events;
SELECT
user_info.first_name,
user_info.last_name,
user_info.email
FROM user_events;
B
SELECT
get_json_object(user_info, '$.first_name') AS first_name,
get_json_object(user_info, '$.last_name') AS last_name,
get_json_object(user_info, '$.email') AS email
FROM user_events;
SELECT
get_json_object(user_info, '$.first_name') AS first_name,
get_json_object(user_info, '$.last_name') AS last_name,
get_json_object(user_info, '$.email') AS email
FROM user_events;
C
SELECT
user_info['first_name'],
user_info['last_name'],
user_info['email']
FROM user_events;
SELECT
user_info['first_name'],
user_info['last_name'],
user_info['email']
FROM user_events;
D
SELECT
from_json(user_info, 'first_name STRING, last_name STRING, email STRING').*
FROM user_events;
SELECT
from_json(user_info, 'first_name STRING, last_name STRING, email STRING').*
FROM user_events;