
Ultimate access to all questions.
The data engineering team maintains the following code:
accountDF = spark.table("accounts")
orderDF = spark.table("orders")
itemDF = spark.table("items")
orderWithItemDF = (orderDF.join(
itemDF,
orderDF.itemID == itemDF.itemID)
.select(
orderDF.accountID,
orderDF.itemID,
itemDF.itemName))
finalDF = (accountDF.join(
orderWithItemDF,
accountDF.accountID == orderWithItemDF.accountID)
.select(
orderWithItemDF["*"],
accountDF.city))
(finalDF.write
.mode("overwrite")
.table("enriched_itemized_orders_by_account"))
accountDF = spark.table("accounts")
orderDF = spark.table("orders")
itemDF = spark.table("items")
orderWithItemDF = (orderDF.join(
itemDF,
orderDF.itemID == itemDF.itemID)
.select(
orderDF.accountID,
orderDF.itemID,
itemDF.itemName))
finalDF = (accountDF.join(
orderWithItemDF,
accountDF.accountID == orderWithItemDF.accountID)
.select(
orderWithItemDF["*"],
accountDF.city))
(finalDF.write
.mode("overwrite")
.table("enriched_itemized_orders_by_account"))
Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?*
A
A batch job will update the enriched_itemized_orders_by_account table, replacing only those rows that have different values than the current version of the table, using accountID as the primary key.
B
The enriched_itemized_orders_by_account table will be overwritten using the current valid version of data in each of the three tables referenced in the join logic.
C
No computation will occur until enriched_itemized_orders_by_account is queried; upon query materialization, results will be calculated using the current valid version of data in each of the three tables referenced in the join logic.
D
An incremental job will detect if new rows have been written to any of the source tables; if new rows are detected, all results will be recalculated and used to overwrite the enriched_itemized_orders_by_account table.