
Ultimate access to all questions.
What is a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
Parquet files will become Delta tables
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
Explanation:
When comparing Parquet and CSV formats for creating external tables with CREATE TABLE AS SELECT (CTAS) statements, the key benefit of Parquet is that it has a well-defined schema. Here's why:
Schema Definition: Parquet is a columnar storage format that stores schema metadata within the file itself. This means the schema (column names, data types, etc.) is embedded in the Parquet file.
CSV Limitations: CSV files are schema-less text files. When creating an external table from CSV, you must explicitly define the schema (column names and data types) in your CREATE TABLE statement, or Databricks will infer the schema which can lead to errors or incorrect data type assignments.
CTAS Context: In a CREATE TABLE AS SELECT statement, when you're creating an external table, Parquet's built-in schema ensures that the table structure is preserved and self-contained within the file format itself.
Other Options Analysis:
Key Takeaway: The primary advantage of using Parquet over CSV for external tables in CTAS statements is that Parquet files contain their own schema metadata, making table creation more reliable and less error-prone.