In the context of a data pipeline, when dealing with the ingestion and processing of streaming data, it is often necessary to organize the data into multiple layers or tables to ensure better data quality and accessibility. In the ETL (Extract, Transform, Load) process, raw data is initially ingested and loaded into a staging area before being cleaned, transformed, and moved to more refined storage locations. The Bronze table is typically the first layer in this multi-step process, where raw, unprocessed data is stored with minimal transformation.

Given this background, which query is responsible for performing the task of moving streaming raw data into a Bronze table?

Exam-Like

(spark.readStream .format("csv") .option("header", "true") .load("/mnt/raw-data") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/bronze") .table("bronze_table"))

25.3%

(spark.readStream .format("delta") .table("silver_table") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/silver") .table("gold_table"))

12.9%

(spark.readStream .format("json") .load("/mnt/raw-data") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/bronze") .table("bronze_table"))

42.6%

(spark.readStream .format("delta") .table("bronze_table") .writeStream .format("delta") .option("checkpointLocation", "/mnt/checkpoints/bronze") .table("silver_table"))

19.2%

Databricks Certified Data Engineer - Associate

Get started today

Comments