
Answer-first summary for fast verification
Answer: ```python option("cloudFiles.format", "json") ```
The correct way to specify the format of the source data when using Auto Loader is by using the `cloudFiles.format` option. This ensures that the data is correctly interpreted as JSON format during ingestion. The Auto Loader query structure typically includes specifying the source format with `cloudFiles.format`, a schema location for tracking changes, and a checkpoint location for fault tolerance. Here's an example of a complete Auto Loader query: ```python ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", source_format) .option("cloudFiles.schemaLocation", checkpoint_directory) .load(data_source) .writeStream .option("checkpointLocation", checkpoint_directory) .option("mergeSchema", "true") .table(table_name) ) ``` This setup allows for the automatic processing and loading of new data into the target table as soon as it arrives at the source.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineer is converting their existing data pipeline to use Auto Loader for incremental processing of JSON files. The following code snippet is part of their implementation:
streaming_df = (
spark
.readStream
.format("cloudFiles")
.______________________
.option("cloudFiles.schemaLocation", schemaLocation)
.load(sourcePath)
)
streaming_df = (
spark
.readStream
.format("cloudFiles")
.______________________
.option("cloudFiles.schemaLocation", schemaLocation)
.load(sourcePath)
)
Which of the following code snippets correctly fills the blank to enable the use of Auto Loader for ingesting the data?
A
option("format", "json")
option("format", "json")
B
option("cloudFiles.format", "json")
option("cloudFiles.format", "json")
C
option("cloudFiles", "json")
option("cloudFiles", "json")
D
option(cloudFiles.format, json)
option(cloudFiles.format, json)
E
option("json")
option("json")
No comments yet.