
Ultimate access to all questions.
A data engineering team has built a pipeline using Auto Loader to ingest data from cloud storage, processing it every hour. Initially, the source data was partitioned by year, month, and day. Recently, a new partition column, hour, was added to the directory structure of new data files, causing the pipeline to fail for a few hours. The team has now addressed this issue in the code. What should the team do next to ingest and reprocess the data that was not loaded due to this issue?
A
Manually reload and process the skipped data, then trigger the pipeline again.
B
Start ingesting the data from scratch and reprocess everything again.
C
Auto Loader utilizes checkpointing and write-ahead logs, allowing a terminated stream to restart and continue from where it left off, ensuring end-to-end, exactly-once semantics under any failure condition.
D
Create a separate pipeline for the new data files arriving in cloud storage using Auto Loader.
E
Move the unprocessed data files to another location, manually run queries to process them, and then merge into the original target location.