
Answer-first summary for fast verification
Answer: spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", 'csv') \ .schema(schema) \ .option("pathGlobfilter", "*.csv") \ .load("s3://bucket/*/data")
This question tests your knowledge on filtering files in Auto Loader based on their extensions. The correct approach involves using the `pathGlobfilter` option to specify the file extension pattern (`*.csv`) and wildcards in the `.load()` method to navigate the directory structure. The correct code snippet is: ``` spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", 'csv') \ .schema(schema) \ .option("pathGlobfilter", "*.csv") \ .load("s3://bucket/*/data") ``` For more details, refer to the documentation on filtering files in Auto Loader.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A team is executing a streaming query using Auto Loader to fetch CSV files from cloud storage. Which query correctly fetches only the files with a .csv extension from the specified locations: s3://bucket/orders/data/, s3://bucket/employees/data/, s3://bucket/students/data/, s3://bucket/policies/data/?
A
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.load("s3://bucket//data/.csv")
B
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.option("pathGlobfilter", ".csv")
.load("s3://bucket//data")
C
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.option("pathFilter", ".csv")
.load("s3://bucket//data")
D
spark.readStream.format("cloudFiles")
.schema(schema)
.load("s3://bucket//data/.csv")
E
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.option("globFilter", ".csv")
.load("s3://bucket//data")
No comments yet.