
Answer-first summary for fast verification
Answer: Setting checkpointLocation to store progress and state information
Checkpoint location stores the streaming job’s progress, including which files have been processed, in a RocksDB key-value store. This enables exactly-once processing and allows the job to resume from the last processed file after a restart. Schema location (A) is for storing schema metadata, not progress. Schema evolution mode (C) handles new columns, not progress tracking. File notification mode (D) is a file detection method, not a progress tracking mechanism.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are building a streaming ingestion pipeline in Databricks using Auto Loader to process JSON files from Amazon S3 into a Delta table. Your manager wants to ensure that:
Files are processed exactly once
The pipeline can resume from the last processed file after a cluster restart or failure
No files are reprocessed unnecessarily
Files are processed exactly once
The pipeline can resume from the last processed file after a cluster restart or failure
No files are reprocessed unnecessarily
Which Auto Loader configuration is responsible for meeting these requirements?
A
Setting cloudFiles.schemaLocation to store schema metadata
B
Setting checkpointLocation to store progress and state information
C
Enabling cloudFiles.schemaEvolutionMode = "addNewColumns"
D
Using file notification mode instead of directory listing mode
No comments yet.