
Databricks Certified Data Engineer - Associate
Get started today
Ultimate access to all questions.
You are tasked with building a production-grade streaming ingestion pipeline in Databricks using Auto Loader to process JSON files from Azure Data Lake Storage.
Your team decides to use Lakeflow Declarative Pipelines to simplify deployment and management.
During a design review, a colleague asks:
"Since we’re using Lakeflow Declarative Pipelines, which parts of the Auto Loader configuration will we still need to manually define, and which will be handled automatically?"
Which of the following is automatically managed by Lakeflow Declarative Pipelines when using Auto Loader?
You are tasked with building a production-grade streaming ingestion pipeline in Databricks using Auto Loader to process JSON files from Azure Data Lake Storage. Your team decides to use Lakeflow Declarative Pipelines to simplify deployment and management.
During a design review, a colleague asks: "Since we’re using Lakeflow Declarative Pipelines, which parts of the Auto Loader configuration will we still need to manually define, and which will be handled automatically?"
Which of the following is automatically managed by Lakeflow Declarative Pipelines when using Auto Loader?
Other
Explanation:
When you use Lakeflow Declarative Pipelines with Auto Loader:
Automatically managed:
Schema definition – Lakeflow can infer the schema automatically.
Checkpoint location – Lakeflow automatically creates and manages the checkpoint directory for exactly-once processing.
Still requires manual configuration if needed:
File notification setup – You must configure cloud-specific services (AWS SQS/SNS, Azure Event Grid, GCP Pub/Sub) if you want to use file notification mode.
Directory listing interval – You can tune this manually for performance.
Watermark configuration – This is a Structured Streaming setting for handling late data and is not automatically set by Lakeflow.