Reddit

Given a table Carts with schema (id LONG, items ARRAY<STRUCT<id: LONG, count: INT>>, email STRING) containing the following data:

1001 | [{"id": "DESK65", "count": 1}] | "u1@domain.com"
1002 | [{"id": "KYBD45", "count": 1}, {"id": "M27", "count": 2}] | "u2@domain.com"
1003 | [{"id": "M27", "count": 1}] | "u3@domain.com"

1001 | [{"id": "DESK65", "count": 1}] | "u1@domain.com"
1002 | [{"id": "KYBD45", "count": 1}, {"id": "M27", "count": 2}] | "u2@domain.com"
1003 | [{"id": "M27", "count": 1}] | "u3@domain.com"

The following MERGE statement with schema evolution enabled is executed:

MERGE INTO carts c
USING updates u
ON c.id = u.id
WHEN MATCHED
THEN UPDATE SET *

MERGE INTO carts c
USING updates u
ON c.id = u.id
WHEN MATCHED
THEN UPDATE SET *

How would this update be processed when applying the following record from the updates view that contains:

A new nested field (coupon) in the items array
A missing existing column (email)

id: 1001
items: [{"id": "DESK65", "count": 2, "coupon": "BOG050"}]
```*

id: 1001
items: [{"id": "DESK65", "count": 2, "coupon": "BOG050"}]
```*

Exam-Like

The update throws an error because changes to existing columns in the target schema are not supported.

18.3%

The new nested Field is added to the target schema, and dynamically read as NULL for existing unmatched records.

46.5%

The update is moved to a separate "rescued" column because it is missing a column expected in the target schema.

11.3%

The new nested field is added to the target schema, and files underlying existing records are updated to include NULL values for the new field.

23.9%

Databricks Certified Data Engineer - Professional

Get started today

Comments