
Ultimate access to all questions.
Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?
A
(spark.readStream.load(rawSalesLocation) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales"))
B
(spark.read.load(rawSalesLocation) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales"))
C
(spark.table("sales") .withColumn("avgPrice", col("sales") / col("units")) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales"))
D
(spark.table("sales") .filter(col("units") > 0) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("newSales") )
E
(spark.table("sales") .groupBy("store") .agg(sum("sales")) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("complete") .table("newSales") )
Explanation:
In the Databricks Medallion Architecture:
A hop from Silver to Gold involves aggregation operations that transform data from detailed records to summarized business metrics.
Let's analyze each option:
Option A: Reads from rawSalesLocation (likely Bronze) and writes to newSales - this is Bronze to Silver.
Option B: Uses spark.read.load() instead of spark.readStream.load() - this is batch processing, not streaming.
Option C: Reads from sales table (Silver), performs a simple column transformation (avgPrice calculation), and writes to newSales - this is Silver to Silver transformation.
Option D: Reads from sales table (Silver), applies a filter, and writes to newSales - this is Silver to Silver transformation with filtering.
Option E: Reads from sales table (Silver), performs aggregation (groupBy("store") and agg(sum("sales"))), and writes to newSales - this is Silver to Gold transformation because it involves aggregation, which is characteristic of Gold table creation.
Key indicators of Silver to Gold hop:
outputMode("complete") which is common for aggregated streaming resultsThe correct answer is E because it demonstrates the transformation from detailed Silver data to aggregated Gold data through grouping and aggregation operations.