
Answer-first summary for fast verification
Answer: Parquet files have a well-defined schema
The correct answer is C because Parquet files have a well-defined schema, which is a key benefit over CSV files. CSV files are schema-less and require schema inference or explicit schema definition, while Parquet files store schema information within the file itself. This makes working with Parquet files more reliable and efficient when using CREATE TABLE AS SELECT statements. **Detailed Explanation:** 1. **Parquet Schema Benefits:** Parquet is a columnar storage format that embeds schema information (data types, column names, etc.) within the file. This eliminates schema inference errors and provides better type safety. 2. **CSV Limitations:** CSV files don't store schema information, so Databricks must infer the schema by scanning the data, which can lead to: - Incorrect data type inference - Performance overhead during schema discovery - Potential errors with null values or mixed data types 3. **Other Options Analysis:** - **A:** Both Parquet and CSV files can be partitioned, so this is not a unique benefit of Parquet. - **B:** This is incorrect - CREATE TABLE AS SELECT statements can be used on files. - **D:** While Parquet files can be optimized, this is not the primary benefit compared to CSV for external tables. - **E:** Parquet files don't automatically become Delta tables; they remain Parquet files unless explicitly converted. 4. **Practical Implication:** When creating external tables, using Parquet ensures data integrity and eliminates schema-related issues that commonly occur with CSV files.
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
CREATE TABLE AS SELECT statements cannot be used on files
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
E
Parquet files will become Delta tables