
Ultimate access to all questions.
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
A
Parquet files can be partitioned
B
CREATE TABLE AS SELECT statements cannot be used on files
C
Parquet files have a well-defined schema
D
Parquet files have the ability to be optimized
E
Parquet files will become Delta tables
Explanation:
The correct answer is C because Parquet files have a well-defined schema, which is a key benefit over CSV files. CSV files are schema-less and require schema inference or explicit schema definition, while Parquet files store schema information within the file itself. This makes working with Parquet files more reliable and efficient when using CREATE TABLE AS SELECT statements.
Detailed Explanation:
Parquet Schema Benefits: Parquet is a columnar storage format that embeds schema information (data types, column names, etc.) within the file. This eliminates schema inference errors and provides better type safety.
CSV Limitations: CSV files don't store schema information, so Databricks must infer the schema by scanning the data, which can lead to:
Other Options Analysis:
Practical Implication: When creating external tables, using Parquet ensures data integrity and eliminates schema-related issues that commonly occur with CSV files.