
Explanation:
The correct way to specify the format of the source data when using Auto Loader is by using the cloudFiles.format option. This ensures that the data is correctly interpreted as JSON format during ingestion. The Auto Loader query structure typically includes specifying the source format with cloudFiles.format, a schema location for tracking changes, and a checkpoint location for fault tolerance. Here's an example of a complete Auto Loader query:
(
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", source_format)
.option("cloudFiles.schemaLocation", checkpoint_directory)
.load(data_source)
.writeStream
.option("checkpointLocation", checkpoint_directory)
.option("mergeSchema", "true")
.table(table_name)
)
(
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", source_format)
.option("cloudFiles.schemaLocation", checkpoint_directory)
.load(data_source)
.writeStream
.option("checkpointLocation", checkpoint_directory)
.option("mergeSchema", "true")
.table(table_name)
)
This setup allows for the automatic processing and loading of new data into the target table as soon as it arrives at the source.
Ultimate access to all questions.
No comments yet.
A data engineer is converting their existing data pipeline to use Auto Loader for incremental processing of JSON files. The following code snippet is part of their implementation:
streaming_df = (
spark
.readStream
.format("cloudFiles")
.______________________
.option("cloudFiles.schemaLocation", schemaLocation)
.load(sourcePath)
)
streaming_df = (
spark
.readStream
.format("cloudFiles")
.______________________
.option("cloudFiles.schemaLocation", schemaLocation)
.load(sourcePath)
)
Which of the following code snippets correctly fills the blank to enable the use of Auto Loader for ingesting the data?
A
option("format", "json")
option("format", "json")
B
option("cloudFiles.format", "json")
option("cloudFiles.format", "json")
C
option("cloudFiles", "json")
option("cloudFiles", "json")
D
option(cloudFiles.format, json)
option(cloudFiles.format, json)
E
option("json")
option("json")