
Answer-first summary for fast verification
Answer: Parquet files have a well-defined schema
## Explanation When comparing Parquet and CSV formats for creating external tables with CREATE TABLE AS SELECT (CTAS) statements, the key benefit of Parquet is that it has a **well-defined schema**. Here's why: 1. **Schema Definition**: Parquet is a columnar storage format that stores schema metadata within the file itself. This means the schema (column names, data types, etc.) is embedded in the Parquet file. 2. **CSV Limitations**: CSV files are schema-less text files. When creating an external table from CSV, you must explicitly define the schema (column names and data types) in your CREATE TABLE statement, or Databricks will infer the schema which can lead to errors or incorrect data type assignments. 3. **CTAS Context**: In a CREATE TABLE AS SELECT statement, when you're creating an external table, Parquet's built-in schema ensures that the table structure is preserved and self-contained within the file format itself. 4. **Other Options Analysis**: - **A (Partitioning)**: Both Parquet and CSV can be partitioned in Databricks, so this is not a unique benefit of Parquet. - **B (Become Delta tables)**: Parquet files don't automatically become Delta tables; you need to explicitly create a Delta table or convert to Delta format. - **D (Optimization)**: While Parquet can be optimized, this is more relevant to Delta Lake features rather than a fundamental benefit of Parquet over CSV for external tables. **Key Takeaway**: The primary advantage of using Parquet over CSV for external tables in CTAS statements is that Parquet files contain their own schema metadata, making table creation more reliable and less error-prone.
Author: Keng Suppaseth
Ultimate access to all questions.
What is a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
Parquet files will become Delta tables
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
No comments yet.