Reddit

The data engineering team maintains the following code:

import pyspark.sql.functions as F
(spark.table("silver_customer_sales")
 .groupBy("customer_id")
 .agg(
    F.min("sale_date").alias("first_transaction_date"),
    F.max("sale_date").alias("last_transaction_date"),
    F.mean("sale_total").alias("average_sales"),
    F.countDistinct("order_id").alias("total_orders"),
    F.sum("sale_total").alias("lifetime_value")
 )
 .write
 .mode("overwrite")
 .table("gold_customer_lifetime_sales_summary")
)

import pyspark.sql.functions as F
(spark.table("silver_customer_sales")
 .groupBy("customer_id")
 .agg(
    F.min("sale_date").alias("first_transaction_date"),
    F.max("sale_date").alias("last_transaction_date"),
    F.mean("sale_total").alias("average_sales"),
    F.countDistinct("order_id").alias("total_orders"),
    F.sum("sale_total").alias("lifetime_value")
 )
 .write
 .mode("overwrite")
 .table("gold_customer_lifetime_sales_summary")
)

Assuming this code produces logically correct results and the source table data has been de-duplicated and validated, which statement describes what will occur when this code is executed?

Exam-Like

The silver_customer_sales table will be overwritten by aggregated values calculated from all records in the gold_customer_lifetime_sales_summary table as a batch job.

10.1%

A batch job will update the gold_customer_lifetime_sales_summary table, replacing only those rows that have different values than the current version of the table, using customer_id as the primary key.

8.4%

The gold_customer_lifetime_sales_summary table will be overwritten by aggregated values calculated from all records in the silver_customer_sales table as a batch job.

71.4%

An incremental job will leverage running information in the state store to update aggregate values in the gold_customer_lifetime_sales_summary table.

4.2%

An incremental job will detect if new rows have been written to the silver_customer_sales table; if new rows are detected, all aggregates will be recalculated and used to overwrite the gold_customer_lifetime_sales_summary table.

5.9%

Databricks Certified Data Engineer - Professional

Get started today

Comments