
Answer-first summary for fast verification
Answer: (spark.table("sales") .groupBy("store") .agg(sum("sales")) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("complete") .table("newSales") )
## Explanation In the Databricks Medallion Architecture: - **Bronze tables** contain raw data - **Silver tables** contain cleaned, filtered, and enriched data - **Gold tables** contain aggregated, business-level data for reporting and analytics A hop from Silver to Gold involves **aggregation operations** that transform data from detailed records to summarized business metrics. Let's analyze each option: **Option A**: Reads from `rawSalesLocation` (likely Bronze) and writes to `newSales` - this is Bronze to Silver. **Option B**: Uses `spark.read.load()` instead of `spark.readStream.load()` - this is batch processing, not streaming. **Option C**: Reads from `sales` table (Silver), performs a simple column transformation (avgPrice calculation), and writes to `newSales` - this is Silver to Silver transformation. **Option D**: Reads from `sales` table (Silver), applies a filter, and writes to `newSales` - this is Silver to Silver transformation with filtering. **Option E**: Reads from `sales` table (Silver), performs aggregation (`groupBy("store")` and `agg(sum("sales"))`), and writes to `newSales` - this is **Silver to Gold** transformation because it involves aggregation, which is characteristic of Gold table creation. **Key indicators of Silver to Gold hop**: 1. Reading from an existing table (Silver layer) 2. Performing aggregation operations (groupBy, sum, avg, etc.) 3. Creating summarized business metrics 4. Using `outputMode("complete")` which is common for aggregated streaming results The correct answer is **E** because it demonstrates the transformation from detailed Silver data to aggregated Gold data through grouping and aggregation operations.
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?
A
(spark.readStream.load(rawSalesLocation) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales"))
B
(spark.read.load(rawSalesLocation) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales"))
C
(spark.table("sales") .withColumn("avgPrice", col("sales") / col("units")) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales"))
D
(spark.table("sales") .filter(col("units") > 0) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales") )
E
(spark.table("sales") .groupBy("store") .agg(sum("sales")) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("complete") .table("newSales") )