
Answer-first summary for fast verification
Answer: FROM csv.`/path/to/data/*` OPTIONS (recursiveFileLookup 'true', header 'true', inferSchema 'true') SELECT *
The correct answer is D, as it correctly uses the 'csv' prefix after the FROM keyword to indicate the data type is CSV. This option also includes all necessary OPTIONS: 'recursiveFileLookup' set to 'true' for including all files matching the pattern in the directory, 'header' set to 'true' to recognize the first row as headers, and 'inferSchema' set to 'true' to automatically deduce the schema from the data. This meets all the specified requirements for efficiently extracting and processing the data from the CSV files.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a scenario where you are working with a large dataset stored in multiple CSV files within a directory, each file named in the pattern 'data_YYYYMMDD.csv' (where YYYYMMDD represents the date), you need to write a Spark DataFrame query to efficiently extract and process this data. Considering the requirements for handling header rows and schema inference, along with the need to recursively look up all files matching the pattern, which of the following queries correctly specifies the data type as CSV and meets all the given requirements? Choose the single best option.
A
FROM com.databricks.spark.csv OPTIONS (path '/path/to/data/*', header 'true', inferSchema 'true') SELECT *
B
FROM delta./path/to/data/* OPTIONS (recursiveFileLookup 'true', header 'true', inferSchema 'true') SELECT *
C
FROM parquet./path/to/data/* OPTIONS (recursiveFileLookup 'true', header 'true', inferSchema 'true') SELECT *
D
FROM csv./path/to/data/* OPTIONS (recursiveFileLookup 'true', header 'true', inferSchema 'true') SELECT *
No comments yet.