
Databricks Certified Data Engineer - Associate
Get started today
Ultimate access to all questions.
At the end of the inventory process, a file is uploaded to cloud object storage. You are tasked with building a process to ingest data incrementally, with the schema expected to change over time. The ingestion process should automatically handle these schema changes. Fill in the blanks in the following Auto Loader command to ensure successful execution:
spark.readStream.format(“cloudfiles“).option(“_______“,”csv).option(“_______“, ‘dbfs:/location/checkpoint/’).load(data_source).writeStream.option(“_______“,’ dbfs:/location/checkpoint/’).option(“_______“, “true“).table(table_name))
At the end of the inventory process, a file is uploaded to cloud object storage. You are tasked with building a process to ingest data incrementally, with the schema expected to change over time. The ingestion process should automatically handle these schema changes. Fill in the blanks in the following Auto Loader command to ensure successful execution:
spark.readStream.format(“cloudfiles“).option(“_______“,”csv).option(“_______“, ‘dbfs:/location/checkpoint/’).load(data_source).writeStream.option(“_______“,’ dbfs:/location/checkpoint/’).option(“_______“, “true“).table(table_name))
Explanation:
The correct answer is cloudfiles.format, cloudfiles.schemalocation, checkpointlocation, mergeSchema
. This configuration ensures that the data is read in the correct format, schema changes are automatically managed by storing the inferred schema and subsequent changes in the specified location, and the stream's progress is checkpointed for fault tolerance. The mergeSchema
option is crucial for automatically handling schema evolution over time.