
Explanation:
This question tests your knowledge on filtering files in Auto Loader based on their extensions. The correct approach involves using the pathGlobfilter option to specify the file extension pattern (*.csv) and wildcards in the .load() method to navigate the directory structure. The correct code snippet is:
spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", 'csv') \
.schema(schema) \
.option("pathGlobfilter", "*.csv") \
.load("s3://bucket/*/data")
spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", 'csv') \
.schema(schema) \
.option("pathGlobfilter", "*.csv") \
.load("s3://bucket/*/data")
For more details, refer to the documentation on filtering files in Auto Loader.
Ultimate access to all questions.
No comments yet.
A team is executing a streaming query using Auto Loader to fetch CSV files from cloud storage. Which query correctly fetches only the files with a .csv extension from the specified locations: s3://bucket/orders/data/, s3://bucket/employees/data/, s3://bucket/students/data/, s3://bucket/policies/data/?
A
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.load("s3://bucket//data/.csv")
B
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.option("pathGlobfilter", ".csv")
.load("s3://bucket//data")
C
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.option("pathFilter", ".csv")
.load("s3://bucket//data")
D
spark.readStream.format("cloudFiles")
.schema(schema)
.load("s3://bucket//data/.csv")
E
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", 'csv')
.schema(schema)
.option("globFilter", ".csv")
.load("s3://bucket//data")