Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

Exam-Like

Community

KKeng

Last updated: January 13, 2026 at 09:02

Databricks SQL

Delta Lake

Unity Catalog

Data Explorer

Auto Loader

Explanation:

Explanation

Auto Loader is specifically designed for incremental data ingestion scenarios where files accumulate in a directory and you need to process only new files since the last run. Here's why:

Key Features of Auto Loader:

Incremental File Processing: Auto Loader automatically tracks which files have been processed and only loads new files in subsequent runs.
File Notification Mode: Uses cloud-native file notification services (like AWS SQS, Azure Event Grid, or GCP Pub/Sub) to efficiently detect new files without directory listing.
Directory Listing Mode: Falls back to directory listing when file notification isn't available.
State Management: Maintains state about processed files, ensuring idempotent processing.

Why Other Options Are Incorrect:

A. Databricks SQL: Primarily for querying and analyzing data, not for incremental file ingestion.
B. Delta Lake: Provides ACID transactions and versioning for data lakes, but doesn't inherently solve the incremental file detection problem.
C. Unity Catalog: A unified governance solution for data and AI assets, not for incremental file ingestion.
D. Data Explorer: A tool for exploring and visualizing data, not for pipeline orchestration or incremental ingestion.

Use Case Fit:

The scenario describes exactly what Auto Loader is designed for: a shared directory where files accumulate and need to be processed incrementally while preserving existing files. Auto Loader's ability to track processed files and only ingest new ones makes it the perfect solution for this requirement.

Powered ByGPT-5.2

Comments

Loading comments...