
Ultimate access to all questions.
A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run. Which of the following tools can the data engineer use to solve this problem?
A
Unity Catalog
B
Delta Lake
C
Databricks SQL
D
Data Explorer
E
Auto Loader
Explanation:
Auto Loader is the correct tool for this scenario because it is specifically designed to incrementally ingest new files from cloud storage or file systems. Here's why:
Key Features of Auto Loader:
Why other options are incorrect:
How Auto Loader solves this problem:
This makes Auto Loader the ideal solution for incremental file ingestion in data pipelines where source files accumulate and need to be processed only once.