
Ultimate access to all questions.
Which of the following queries is performing a streaming hop from raw data to a Bronze table?
A
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales")
)
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales")
)
B
(spark.table("sales")
.filter(col("units") > 0)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)
(spark.table("sales")
.filter(col("units") > 0)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)
C
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)
D
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales")
)
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales")
)
E
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)
Explanation:
Option E is the correct answer because it demonstrates a streaming hop from raw data to a Bronze table. Here's why:
spark.readStream.load(rawSalesLocation) which reads streaming data from raw files.table("newSales") to write to a Delta table (Bronze layer)readStream and writeStream for continuous data ingestion.option("checkpointLocation", checkpointPath) for fault tolerance.outputMode("append") appropriate for streaming ingestionspark.table("sales")) rather than raw data, so they're not performing the initial ingestion from raw to Bronze.spark.read.load) instead of streaming, so it's not a streaming hop.In Databricks Medallion Architecture:
Option E correctly implements this pattern by streaming data from raw files directly into a Bronze table.