
Answer-first summary for fast verification
Answer: Parquet files have the ability to be optimized
## Explanation When comparing Parquet vs CSV for external tables created with CREATE TABLE AS SELECT (CTAS), the key benefit of Parquet is its ability to be optimized. Here's why: 1. **Option A (Parquet files can be partitioned)**: Both Parquet and CSV files can be partitioned, so this is not a unique benefit of Parquet. 2. **Option B (CREATE TABLE AS SELECT statements cannot be used on files)**: This is incorrect - CTAS can be used with files in Databricks. 3. **Option C (Parquet files have a well-defined schema)**: While Parquet does have a schema, this is not the primary benefit compared to CSV in the context of CTAS operations. 4. **Option D (Parquet files have the ability to be optimized)**: **CORRECT** - This is the key advantage. Parquet files support: - **Columnar storage**: Enables better compression and faster query performance - **Predicate pushdown**: Allows filtering at the file level - **Statistics**: Stores min/max values for efficient filtering - **Compression**: Better compression ratios than CSV - **Schema evolution**: Better support for schema changes 5. **Option E (Parquet files will become Delta tables)**: This is incorrect - Parquet files remain Parquet files unless explicitly converted to Delta format. In Databricks, when creating external tables, Parquet format provides significant performance optimizations that CSV lacks, making it the preferred choice for most data engineering workloads.
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
CREATE TABLE AS SELECT statements cannot be used on files
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
E
Parquet files will become Delta tables