
Explanation:
In Databricks Lakehouse architecture, data typically flows through three layers:
Analyzing each option:
Option A & B: These perform aggregation operations (groupBy and agg) which are more typical of Silver-to-Gold transformations (aggregating cleaned data for analytics). They don't show data cleaning or transformation from raw format.
Option C: This query performs a data transformation by computing avgPrice (dividing sales by units), which represents data cleaning and enrichment. The output table name "cleanedSales" clearly indicates it's creating a Silver-level table from potentially raw data. The append output mode suggests incremental processing of cleaned data.
Option D: This reads from a raw location and writes to "uncleanedSales", suggesting it's still in Bronze layer (raw data).
Option E: Incomplete syntax, not a valid query.
Key indicators of Bronze-to-Silver transformation:
Option C demonstrates all these characteristics with the withColumn transformation and the "cleanedSales" output table name.
Ultimate access to all questions.
No comments yet.
Which of the following Structured Streaming queries is performing a hop from a Bronze table to a Silver table?
A
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
B
(spark.table("sales")
.agg(sum("sales"), sum("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
(spark.table("sales")
.agg(sum("sales"), sum("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("aggregatedSales"))
C
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("cleanedSales"))
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("cleanedSales"))
D
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("uncleanedSales"))
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("uncleanedSales"))
E
(spark.read.load(rawSalesLocation)
.writeStream
(spark.read.load(rawSalesLocation)
.writeStream