
Answer-first summary for fast verification
Answer: It identifies new files by scanning the source directory and incrementally loads them into the target Delta table with idempotent guarantees.
### Explanation By default, **Auto Loader** operates in **directory listing mode**. In this mode, Databricks discovers new files by periodically listing the input directory and tracking which files have already been processed. This allows for **incremental and idempotent** ingestion, ensuring that each file is loaded into the target Delta Lake table exactly once. ### Why other options are incorrect: - **Cloud-specific notification services**: This refers to **File Notification Mode**, which must be explicitly configured (e.g., setting `cloudFiles.useNotifications` to `true`). It is not the default behavior. - **Full Scans/Recreations**: Auto Loader is designed for streaming and incremental workloads; it does not perform a full scan of all files to recreate the table from scratch. - **Webhooks**: Auto Loader does not rely on webhooks to initiate data processing; it integrates with Spark Structured Streaming to poll for changes or receive cloud notifications.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Which statement accurately describes the default behavior and mechanism used by Databricks Auto Loader for data ingestion?
A
It leverages cloud-native queue storage and notification services to identify and incrementally load new files into Delta Lake.
B
It identifies new files by scanning the source directory and incrementally loads them into the target Delta table with idempotent guarantees.
C
It employs external webhooks to trigger a Databricks Job whenever a file is uploaded, merging the data using inferred schema rules.
D
It performs a full scan of the input directory on every trigger to recreate the target Delta table from all available source files.