
Explanation:
Auto Loader is specifically designed for incremental file processing scenarios in Azure Databricks and perfectly addresses all the stated requirements:
1. Incremental Processing: Auto Loader uses the cloudFiles source to automatically detect and process new files as they arrive in Azure Data Lake Storage Gen2. It maintains state information about processed files, ensuring only new files are processed without manual intervention.
2. Minimized Implementation & Maintenance: Auto Loader significantly reduces implementation complexity by:
3. Cost Optimization for Millions of Files: Auto Loader offers two processing modes:
4. Schema Inference & Evolution: Auto Loader provides robust schema handling capabilities:
A. COPY INTO:
B. Azure Data Factory:
D. Apache Spark FileStreamSource:
Auto Loader is the optimal choice because it's specifically engineered for cloud-based incremental file processing scenarios in Azure Databricks. It provides a comprehensive solution that minimizes operational complexity while maximizing efficiency and cost-effectiveness for processing large volumes of files with evolving schemas.
Ultimate access to all questions.
No comments yet.
You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage1. New files are uploaded daily to storage1.
You need to recommend a solution to configure storage1 as a structured streaming source that meets the following requirements:
What should you include in the recommendation?
A
COPY INTO
B
Azure Data Factory
C
Auto Loader
D
Apache Spark FileStreamSource