Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

You are building a streaming ingestion pipeline in Databricks using Auto Loader to process JSON files from Amazon S3 into a Delta table. Your manager wants to ensure that:

Files are processed exactly once
The pipeline can resume from the last processed file after a cluster restart or failure
No files are reprocessed unnecessarily

Files are processed exactly once
The pipeline can resume from the last processed file after a cluster restart or failure
No files are reprocessed unnecessarily

Which Auto Loader configuration is responsible for meeting these requirements?

Exam-Like

Setting cloudFiles.schemaLocation to store schema metadata

6.7%

Setting checkpointLocation to store progress and state information

83.3%

Enabling cloudFiles.schemaEvolutionMode = "addNewColumns"

6.7%

Comments

Loading comments...