
Ultimate access to all questions.
Which statement describes the default execution mode for Databricks Auto Loader?
A
Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; the target table is materialized by directly querying all valid files in the source directory.
B
New files are identified by listing the input directory; the target table is materialized by directly querying all valid files in the source directory.
C
Webhooks trigger a Databricks job to run anytime new data arrives in a source directory; new data are automatically merged into target tables using rules inferred from the data.
D
New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.
E
Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and idempotently loaded into the target Delta Lake table.
Explanation:
Correct Answer: D
Auto Loader's default execution mode uses directory listing to identify new files, not cloud notification services. Here's why:
Default Mode (Directory Listing): When you don't explicitly configure cloud notification services, Auto Loader defaults to listing the input directory to discover new files. This is option D.
Incremental and Idempotent Loading: Auto Loader processes files incrementally (only new files since last run) and idempotently (same files won't be processed multiple times).
Cloud Notification Mode (Option E): This is the optimized mode that uses cloud-specific services (AWS SQS/SNS, Azure Event Grid, GCP Pub/Sub), but it's not the default. You need to explicitly configure this mode.
Why not other options:
Key Takeaway: Auto Loader defaults to directory listing for file discovery, which is simpler to set up but may have scalability limits with very large directories. The cloud notification mode provides better performance but requires additional configuration.