
Ultimate access to all questions.
Which query is performing a streaming hop from raw data to a Bronze table?
A
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales"))
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales"))
B
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales"))
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales"))
C
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
D
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
Explanation:
Option D is correct because it performs a streaming hop from raw data to a Bronze table. Here's why:
spark.readStream.load(rawSalesLocation) which reads from a streaming source (raw data location)..writeStream to write as a streaming job..option("checkpointLocation", checkpointPath) for fault tolerance..outputMode("append") which is appropriate for streaming incremental data..table("newSales")) which represents the Bronze layer in the medallion architecture.Why other options are incorrect:
spark.table("sales")) not raw data, and uses complete mode which rewrites the entire table each time.spark.read.load()) not streaming, and writes with .write not .writeStream.In Databricks medallion architecture, a streaming hop from raw data to Bronze typically involves reading streaming data from raw sources and writing it to Bronze tables with minimal transformations, exactly what Option D demonstrates.