Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


You are building a streaming ingestion pipeline in Databricks using Auto Loader to process JSON files from Amazon S3 into a Delta table. Your manager wants to ensure that:

Files are processed exactly once
The pipeline can resume from the last processed file after a cluster restart or failure
No files are reprocessed unnecessarily

Which Auto Loader configuration is responsible for meeting these requirements?




Explanation:

Checkpoint location stores the streaming job’s progress, including which files have been processed, in a RocksDB key-value store. This enables exactly-once processing and allows the job to resume from the last processed file after a restart. Schema location (A) is for storing schema metadata, not progress. Schema evolution mode (C) handles new columns, not progress tracking. File notification mode (D) is a file detection method, not a progress tracking mechanism.