
Ultimate access to all questions.
What is a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
Parquet files will become Delta tables
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
Explanation:
When comparing Parquet vs CSV for creating external tables using CREATE TABLE AS SELECT (CTAS), the key benefit of Parquet is that it has a well-defined schema. Here's why:
Schema Definition: Parquet is a columnar storage format that stores schema information along with the data. When you create an external table from Parquet files, Databricks can automatically infer the schema from the Parquet metadata.
CSV Limitations: CSV files don't inherently store schema information. When creating external tables from CSV, you often need to explicitly specify column names and data types, or Databricks may need to sample the data to infer the schema, which can be error-prone.
Other Options Analysis:
CTAS Context: When using CREATE TABLE AS SELECT, the schema from the SELECT query needs to be applied to the output files. Parquet's inherent schema support makes this process more reliable and efficient compared to CSV.
Therefore, the correct answer is C - Parquet files have a well-defined schema, which is particularly beneficial when creating external tables via CTAS statements.