
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: ```scala (spark.table("sales") .withColumn("avgPrice", col("sales") / col("units")) .writeStream .option("checkpointLocation", checkpointPath) .outputMode("append") .table("cleanedSales")) ```
## Explanation In the Databricks medallion architecture: - **Bronze tables** contain raw, unprocessed data - **Silver tables** contain cleaned, validated, and enriched data - **Gold tables** contain aggregated data for business intelligence **Analysis of each option:** - **Option A & B**: These perform aggregations (`groupBy` and `agg`) which are typically associated with creating Gold tables from Silver tables, not Bronze to Silver transformations. - **Option C**: This query performs data cleaning/enrichment by calculating `avgPrice` using `withColumn(col("sales") / col("units"))` and writes to a table named `cleanedSales`. This represents the typical Bronze to Silver transformation where raw data is cleaned and enriched. - **Option D**: This writes to a table named `uncleanedSales`, indicating it's still in the Bronze stage (raw data). - **Option E**: This is an incomplete query and doesn't represent a valid transformation. The key indicators that Option C represents a Bronze to Silver hop are: - Data transformation/cleaning (`withColumn` operation) - Output table named `cleanedSales` (suggesting cleaned data) - `append` output mode (appropriate for incremental data processing) Therefore, **Option C** correctly represents moving from Bronze (raw data) to Silver (cleaned, processed data).
Author: LeetQuiz .
Question 31
Which of the following Structured Streaming queries is performing a hop from a Bronze table to a Silver table?
A
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
B
(spark.table("sales")
.agg(sum("sales"), sum("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
(spark.table("sales")
.agg(sum("sales"), sum("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
C
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("cleanedSales"))
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("cleanedSales"))
D
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("uncleanedSales"))
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("uncleanedSales"))
E
(spark.read.load(rawSalesLocation)
.writeStream
(spark.read.load(rawSalesLocation)
.writeStream
No comments yet.