
Answer-first summary for fast verification
Answer: CREATE TEMPORARY VIEW json_view AS SELECT * FROM json.`json_data/*` OPTIONS (multiLine 'true', primitivesAsString 'true')
The correct answer is A because it correctly specifies the use of the 'json' format prefix to read JSON files, which is essential for Spark to interpret the file format correctly. The OPTIONS provided ('multiLine' and 'primitivesAsString') are appropriate for JSON data, allowing Spark to handle JSON structures that span multiple lines and to treat primitive JSON types as strings, respectively. Options B, C, and D are incorrect because they either use the wrong format prefix (csv, parquet, delta) for JSON files or contain typographical errors in the OPTIONS keywords, which would prevent the query from executing successfully.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a scenario where you are working with a large dataset stored as JSON files in a directory named 'json_data', with files named in the pattern 'data_YYYYMMDD.json' (where YYYYMMDD represents the date), you need to create a temporary view named 'json_view' in Spark to analyze this data. Considering the need for efficient data processing and the correct use of Spark's DataFrame API, which of the following queries would you use to achieve this? Choose the best option that correctly reads the JSON files and creates the temporary view with the appropriate options for handling JSON data.
A
CREATE TEMPORARY VIEW json_view AS SELECT * FROM json.json_data/* OPTIONS (multiLine 'true', primitivesAsString 'true')
B
CREATE TEMPORARY VIEW json_view AS SELECT * FROM com.databricks.spark.csv OPTIONS (path 'json_data/*', header 'true', inferSchema 'true')
C
CREATE TEMPORARY VIEW json_view AS SELECT * FROM parquet.json_data/* OPTIONS (multiLine 'true', primitivesAsString 'true')
D
CREATE TEMPORARY VIEW json_view AS SELECT * FROM delta.json_data/* OPTIONS (multiLine 'true', primitivesAsString 'true')