
Ultimate access to all questions.
Which of the following queries is performing a streaming hop from raw data to a Bronze table?
A
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales")
)```
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales")
)```
B
(spark.table("sales")
.filter(col("units") > 0)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)```
(spark.table("sales")
.filter(col("units") > 0)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)```
C
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)```
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)```
D
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales")
)```
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales")
)```
E
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)```
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales")
)```
Explanation:
Option E is the correct answer because it demonstrates a streaming hop from raw data to a Bronze table. Here's why:
spark.readStream.load(rawSalesLocation) which reads streaming data from a raw location.table("newSales") to write to a Bronze tablereadStream and writeStream indicate streaming processing.option("checkpointLocation", checkpointPath) for fault tolerance.outputMode("append") which is typical for streaming writes to tablesOption A, B, C: These all read from spark.table("sales") which means they're reading from an existing table (not raw data), and performing transformations before writing. This represents a transformation hop rather than a raw-to-bronze hop.
Option D: This uses batch processing (spark.read.load and .write) rather than streaming, so it's not a streaming hop.
In the Databricks Lakehouse architecture:
A streaming hop from raw data to Bronze typically involves reading streaming data from sources (like Kafka, Event Hubs, files) and writing it directly to Bronze tables with minimal transformation, which is exactly what Option E demonstrates.