
Answer-first summary for fast verification
Answer: ```(spark.readStream .format("json") .load("/mnt/raw-data") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/bronze") .table("bronze_table"))```
Option C is the correct answer because it reads streaming raw data in JSON format from the raw data location and writes it to the Bronze table, which is the standard pattern for ingesting raw data into the Bronze layer. The other options either read from or write to the wrong layers or use the wrong data format for the initial raw ingestion.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of a data pipeline, when dealing with the ingestion and processing of streaming data, it is often necessary to organize the data into multiple layers or tables to ensure better data quality and accessibility. In the ETL (Extract, Transform, Load) process, raw data is initially ingested and loaded into a staging area before being cleaned, transformed, and moved to more refined storage locations. The Bronze table is typically the first layer in this multi-step process, where raw, unprocessed data is stored with minimal transformation.
Given this background, which query is responsible for performing the task of moving streaming raw data into a Bronze table?
A
(spark.readStream .format("csv") .option("header", "true") .load("/mnt/raw-data") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/bronze") .table("bronze_table"))
B
(spark.readStream .format("delta") .table("silver_table") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/silver") .table("gold_table"))
C
(spark.readStream .format("json") .load("/mnt/raw-data") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/bronze") .table("bronze_table"))
D
(spark.readStream .format("delta") .table("bronze_table") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/bronze") .table("silver_table"))