
Ultimate access to all questions.
A Spark ETL pipeline processes data nightly. One stage requires identifying new records within a Delta Lake table named bronze that have not yet been processed downstream. This table is partitioned by year, month, and day.
Which of the following designs for the new_records function effectively returns a Spark DataFrame containing only the unprocessed data from the bronze table?_
A
return spark.read.option("readChangeFeed", "true").table("bronze")
B
return spark.readStream.table("bronze")
C
return spark.readStream.load("bronze")
D
return spark.read.table("bronze").filter(col("ingest_time") == current_timestamp())
E
return spark.read.table("bronze").filter(col("source_file") == f"/mnt/daily_batch/{year}/{month}/{day}")