
Ultimate access to all questions.
You are building a streaming ingestion pipeline in Databricks using Auto Loader to process JSON files from Amazon S3 into a Delta table.
Your manager wants to ensure that:
Files are processed exactly once
The pipeline can resume from the last processed file after a cluster restart or failure
No files are reprocessed unnecessarily
Files are processed exactly once
The pipeline can resume from the last processed file after a cluster restart or failure
No files are reprocessed unnecessarily
Which Auto Loader configuration is responsible for meeting these requirements?
A
Setting cloudFiles.schemaLocation to store schema metadata
B
Setting checkpointLocation to store progress and state information
C
Enabling cloudFiles.schemaEvolutionMode = "addNewColumns"
D
Using file notification mode instead of directory listing mode