
Answer-first summary for fast verification
Answer: `spark.readStream.format('cloudFiles').option('cloudFiles.format', 'json').load(source_path).writeStream.option('checkpointLocation', checkpointPath).start('target_table')`
Option A is correct because: It uses Databricks Auto Loader with the recommended syntax: format('cloudFiles') and sets the file format to JSON using .option('cloudFiles.format', 'json'). This allows Databricks to incrementally and efficiently ingest new JSON files from cloud storage into a Delta table in near real-time. The use of .writeStream with a checkpoint location ensures reliable, fault-tolerant streaming ingestion. This approach follows Databricks’ best practices for streaming data ingestion.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineer aims to incrementally ingest JSON data into a Delta table in near real-time. Which method correctly achieves this?
A
spark.readStream.format('cloudFiles').option('cloudFiles.format', 'json').load(source_path).writeStream.option('checkpointLocation', checkpointPath).start('target_table')
B
spark.readStream.format('autoloader').option('autoloader.format', 'json').load(source_path).writeStream.option('checkpointLocation', checkpointPath).trigger(real-time=True).start('target_table')
C
spark.readStream.format('autoloader').option('autoloader.format', 'json').load(source_path).writeStream.option('checkpointLocation', checkpointPath).start('target_table')
D
spark.readStream.format('cloudFiles').option('cloudFiles.format', 'json').load(source_path).writeStream.trigger(real-time=True).start('target_table')
E
spark.readStream.format('cloudFiles').option('cloudFiles.format', 'json').load(source_path).writeStream.trigger(availableNow=True).start('target_table')