
Ultimate access to all questions.
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
CREATE TABLE AS SELECT statements cannot be used on files
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
E
Parquet files will become Delta tables
Explanation:
When comparing Parquet vs CSV for external tables created with CREATE TABLE AS SELECT (CTAS), the key benefit of Parquet is its ability to be optimized. Here's why:
Option A (Parquet files can be partitioned): Both Parquet and CSV files can be partitioned, so this is not a unique benefit of Parquet.
Option B (CREATE TABLE AS SELECT statements cannot be used on files): This is incorrect - CTAS can be used with files in Databricks.
Option C (Parquet files have a well-defined schema): While Parquet does have a schema, this is not the primary benefit compared to CSV in the context of CTAS operations.
Option D (Parquet files have the ability to be optimized): CORRECT - This is the key advantage. Parquet files support:
Option E (Parquet files will become Delta tables): This is incorrect - Parquet files remain Parquet files unless explicitly converted to Delta format.
In Databricks, when creating external tables, Parquet format provides significant performance optimizations that CSV lacks, making it the preferred choice for most data engineering workloads.