
Ultimate access to all questions.
Which of the following queries is performing a streaming hop from raw data to a Bronze table?
A
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales"))
(spark.table("sales")
.groupBy("store")
.agg(sum("sales"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("complete")
.table("newSales"))
B
(spark.table("sales")
.filter(col("units") > 0)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
(spark.table("sales")
.filter(col("units") > 0)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
C
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
(spark.table("sales")
.withColumn("avgPrice", col("sales") / col("units"))
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
D
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales"))
(spark.read.load(rawSalesLocation)
.write
.mode("append")
.table("newSales"))
E
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
(spark.readStream.load(rawSalesLocation)
.writeStream
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.table("newSales"))
Explanation:
Option E is the correct answer because it represents a streaming hop from raw data to a Bronze table. Here's why:
spark.readStream.load(rawSalesLocation) to read streaming data from raw files.writeStream to write as a streaming job.option("checkpointLocation", checkpointPath) for fault tolerance.outputMode("append") which is appropriate for streaming writes to Bronze tablesOption A, B, C: These all read from spark.table("sales") which means they're reading from an existing table (likely a Bronze or Silver table), not from raw data. They're performing transformations on existing tables, not the initial ingestion from raw data.
Option D: This uses batch processing (spark.read.load) not streaming, and doesn't include checkpointing, which is essential for streaming jobs.
In Databricks medallion architecture:
Option E correctly demonstrates the pattern for streaming ingestion from raw data files into a Bronze table.