
Answer-first summary for fast verification
Answer: (spark.table(“sales“) .groupBy(“store“) .agg(sum(“sales“)) .writeStream .option(“checkpointLocation“, checkpointPath) .outputMode(“complete“) .table(“aggregatedSales“) )
The correct answer is the query that aggregates data from the 'sales' table by 'store' and sums up 'sales', then writes the aggregated data to the 'aggregatedSales' table in the Gold layer. This is because the Gold layer is designed for storing aggregated data to support ML applications, reporting, and analytics. The other options either do not perform aggregation or do not correctly use Structured Streaming to transition data between layers. For a deeper understanding, refer to the Medallion Architecture documentation, which outlines the distinct roles of Bronze, Silver, and Gold layers in data processing and storage.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which Structured Streaming query correctly transitions data from a Silver to a Gold table in Databricks?
A
(spark.read.load(rawSalesLocation) .writeStream .option(“checkpointLocation“, checkpointPath) .outputMode(“append“) .table(“uncleanedSales“) )
B
(spark.table(“sales“) .groupBy(“store“) .agg(sum(“sales“)) .writeStream .option(“checkpointLocation“, checkpointPath) .outputMode(“complete“) .table(“aggregatedSales“) )
C
(spark.table(“sales“) .withColumn(“avgPrice“, col(“sales“) / col(“units“)) .writeStream .option(“checkpointLocation“, checkpointPath) .outputMode(“append“) .table(“cleanedSales“) )
D
(spark.readStream.load(rawSalesLocation) .writeStream .option(“checkpointLocation“, checkpointPath) .outputMode(“append“) .table(“uncleanedSales“) )
E
(spark.table(“sales“) .writeStream .option(“checkpointLocation“, checkpointPath) .outputMode(“complete“) .table(“sales“) )
No comments yet.