
Ultimate access to all questions.
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
CREATE TABLE AS SELECT statements cannot be used on files
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
E
Parquet files will become Delta tables
Explanation:
Correct Answer: C
When comparing Parquet vs CSV for creating external tables with CREATE TABLE AS SELECT (CTAS), the key benefit of Parquet is that it has a well-defined schema. Here's why:
Schema Definition: Parquet files store schema information (data types, column names, etc.) within the file itself, whereas CSV files are schema-less text files. When creating an external table from CSV, you often need to explicitly define the schema or rely on inference, which can be error-prone.
CTAS Operations: With CTAS statements, Parquet's inherent schema makes it easier to create tables because the schema is automatically detected and applied, reducing the need for manual schema specification.
Other Options Analysis:
Practical Implication: When working with external tables in Databricks, using Parquet format simplifies schema management and ensures data type consistency, which is particularly valuable in data engineering workflows where schema evolution and data quality are important considerations.