
Explanation:
The correct answer is C.
In the context of the Databricks Certified Data Engineer - Associate exam, understanding how different join types handle missing keys is a fundamental skill. Here is a breakdown of why Option C is the correct result of the LEFT JOIN operation.
A LEFT JOIN returns all records from the left table (sales), and the matched records from the right table (favorite_stores). If there is no match, the result is NULL on the side of the right table.
FROM is sales. All rows from this table must appear in the final output.
a1 (present)a3 (present)a4 (present)customer_id values in favorite_stores.
favorite_stores. It matches with store_id s1.favorite_stores. Because this is a LEFT JOIN, the row is kept, and the store_id is filled with NULL.favorite_stores. It matches with store_id s2.SELECT statement specifically asks for sales.customer_id, sales.spend, and favorite_stores.store_id.| customer_id | spend | store_id |
|---|---|---|
| a1 | 28.94 | s1 |
| a3 | 874.1223 | NULL |
| a4 | 8.99 | s2 |
customer_id a2. This ID only exists in the right table (favorite_stores). In a LEFT JOIN, records that exist only in the right table are excluded. (These would only appear in a RIGHT JOIN or FULL OUTER JOIN).INNER JOIN. It excluded a3 because there was no matching record in the right table. A LEFT JOIN must include all records from the left table regardless of a match.Ultimate access to all questions.
A data engineer is working with two tables. Each of these tables is displayed below in its entirety.
sales
customer_id spend units
a1 28.94 7
a3 874.1223 -
a4 8.99 1
customer_id spend units
a1 28.94 7
a3 874.1223 -
a4 8.99 1
favorite_stores
customer_id store_id
a1 s1
a2 s1
a4 s2
customer_id store_id
a1 s1
a2 s1
a4 s2
The data engineer runs the following query to join these tables together:
SELECT
sales.customer_id, sales.spend,
favorite_stores.store_id
FROM sales
LEFT JOIN favorite_stores
ON sales.customer_id = favorite_stores.customer_id;
SELECT
sales.customer_id, sales.spend,
favorite_stores.store_id
FROM sales
LEFT JOIN favorite_stores
ON sales.customer_id = favorite_stores.customer_id;
A
customer_id spend store_id
a1 28.94 s1
a2 NULL s1
a4 8.99 s2
customer_id spend store_id
a1 28.94 s1
a2 NULL s1
a4 8.99 s2
B
customer_id spend store_id
a1 28.94 s1
a4 8.99 s2
customer_id spend store_id
a1 28.94 s1
a4 8.99 s2
C
customer_id spend store_id
a1 28.94 s1
a3 874.12 NULL
a4 8.99 s2
customer_id spend store_id
a1 28.94 s1
a3 874.12 NULL
a4 8.99 s2
D
customer_id spend store_id
a1 28.94 s1
a2 NULL s1
a3 874.12 NULL
a4 8.99 s2
customer_id spend store_id
a1 28.94 s1
a2 NULL s1
a3 874.12 NULL
a4 8.99 s2