
Question 28 A data engineering team is in the process of converting their existing data pipeline to utilize Auto Loader for incremental processing in the ingestion of JSON files. One data engineer comes across the following code block in the Auto Loader documentation:
(streaming_df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.schemaLocation", schemaLocation)
.load(sourcePath))
(streaming_df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.schemaLocation", schemaLocation)
.load(sourcePath))
Assuming that schemaLocation and sourcePath have been set correctly, which of the following changes does the data engineer need to make to convert this code block to use Auto Loader to ingest the data?_
Explanation:
The correct answer is C because the code block shown is already using the proper Auto Loader syntax.
Key Points:
format("cloudFiles") is the correct format specification for Auto Loader in DatabrickscloudFiles.format option specifies the file format (JSON in this case)cloudFiles.schemaLocation option is used for schema evolution in Auto LoaderWhy other options are incorrect:
"cloudFiles" format, not "autoLoader".autoLoader method in the Auto Loader APIAuto Loader Syntax:
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "<file-format>")
.option("cloudFiles.schemaLocation", "<schema-location>")
.load("<source-path>")
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "<file-format>")
.option("cloudFiles.schemaLocation", "<schema-location>")
.load("<source-path>")
The provided code is already correctly configured for Auto Loader incremental processing.
Ultimate access to all questions.