
Ultimate access to all questions.
accountDF = spark.table("accounts")
orderDF = spark.table("orders")
itemDF = spark.table("items")
orderWithItemDF = (orderDF.join(
itemDF,
orderDF.itemID == itemDF.itemID)
.select(
orderDF.accountID,
orderDF.itemID,
itemDF.itemName)
)
finalDF = (accountDF.join(
orderWithItemDF,
accountDF.accountID == orderWithItemDF.accountID)
.select(
orderWithItemDF["*"],
accountDF.city)
)
(finalDF.write
.mode("overwrite")
.table("enriched_itemized_orders_by_account"))
accountDF = spark.table("accounts")
orderDF = spark.table("orders")
itemDF = spark.table("items")
orderWithItemDF = (orderDF.join(
itemDF,
orderDF.itemID == itemDF.itemID)
.select(
orderDF.accountID,
orderDF.itemID,
itemDF.itemName)
)
finalDF = (accountDF.join(
orderWithItemDF,
accountDF.accountID == orderWithItemDF.accountID)
.select(
orderWithItemDF["*"],
accountDF.city)
)
(finalDF.write
.mode("overwrite")
.table("enriched_itemized_orders_by_account"))
Question
Assuming this code produces logically correct results and the source tables have been deduplicated and validated, what will happen when this code is executed?*
A
A batch job will update the enriched_itemized_orders_by_account table, replacing only those rows that have different values than the current version of the table, using accountID as the primary key.
B
The enriched_itemized_orders_by_account table will be overwritten using the current valid version of data in each of the three tables referenced in the join logic.
C
An incremental job will leverage information in the state store to identify unjoined rows in the source tables and write these rows to the enriched_iteinized_orders_by_account table.
D
An incremental job will detect if new rows have been written to any of the source tables; if new rows are detected, all results will be recalculated and used to overwrite the enriched_itemized_orders_by_account table.
E
No computation will occur until enriched_itemized_orders_by_account is queried; upon query materialization, results will be calculated using the current valid version of data in each of the three tables referenced in the join logic.