Databricks Certified Data Engineer - Associate

Ultimate access to all questions.

In a scenario where you are working with a large dataset stored in multiple CSV files within a directory, each file named in the pattern 'data_YYYYMMDD.csv' (where YYYYMMDD represents the date), you need to write a Spark DataFrame query to efficiently extract and process this data. Considering the requirements for handling header rows and schema inference, along with the need to recursively look up all files matching the pattern, which of the following queries correctly specifies the data type as CSV and meets all the given requirements? Choose the single best option.

Simulated

FROM com.databricks.spark.csv OPTIONS (path '/path/to/data/*', header 'true', inferSchema 'true') SELECT *

15.5%

FROM delta./path/to/data/* OPTIONS (recursiveFileLookup 'true', header 'true', inferSchema 'true') SELECT *

Loading comments...

FROM parquet./path/to/data/* OPTIONS (recursiveFileLookup 'true', header 'true', inferSchema 'true') SELECT *

4.6%

FROM csv./path/to/data/* OPTIONS (recursiveFileLookup 'true', header 'true', inferSchema 'true') SELECT *

74.3%