
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: Parquet files have a well-defined schema
## Explanation When comparing Parquet vs CSV for external tables in Databricks: **Parquet advantages:** 1. **Schema enforcement**: Parquet files have a well-defined schema embedded within the file format itself, which includes data types, column names, and metadata. 2. **Columnar storage**: Parquet is a columnar format optimized for analytics workloads. 3. **Compression**: Better compression ratios compared to CSV. 4. **Schema evolution**: Supports schema evolution capabilities. **CSV limitations:** 1. **No embedded schema**: CSV files don't contain schema information - schema must be inferred or explicitly defined. 2. **Type inference issues**: Databricks must infer data types from CSV content, which can lead to errors. 3. **No compression**: Typically larger file sizes. 4. **Parsing overhead**: More expensive to parse during query execution. **Why other options are incorrect:** - **A**: Both Parquet and CSV files can be partitioned in Databricks. - **B**: External tables from Parquet files don't automatically become Delta tables; they remain external tables pointing to Parquet files. - **D**: While Parquet files can be optimized, this is not the primary benefit over CSV for external tables. The key benefit is that Parquet's embedded schema eliminates schema inference issues and provides better type safety compared to CSV.
Author: Keng Suppaseth
What is a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
Parquet files will become Delta tables
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
No comments yet.