Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

What is a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

Parquet files can be partitioned

Parquet files will become Delta tables

Parquet files have a well-defined schema

Parquet files have the ability to be optimized

Explanation:

Explanation

When comparing Parquet vs CSV for creating external tables using CREATE TABLE AS SELECT (CTAS), the key benefit of Parquet is that it has a well-defined schema. Here's why:

Schema Definition: Parquet is a columnar storage format that stores schema information along with the data. When you create an external table from Parquet files, Databricks can automatically infer the schema from the Parquet metadata.
CSV Limitations: CSV files don't inherently store schema information. When creating external tables from CSV, you often need to explicitly specify column names and data types, or Databricks may need to sample the data to infer the schema, which can be error-prone.
Other Options Analysis:
- Option A (Parquet files can be partitioned): Both Parquet and CSV files can be partitioned in Databricks, so this isn't a unique benefit.
- Option B (Parquet files will become Delta tables): This is incorrect. External tables created from Parquet files remain as external tables over Parquet files; they don't automatically become Delta tables.
- Option D (Parquet files have the ability to be optimized): While Parquet files can be optimized through techniques like compaction, this isn't the primary benefit when creating external tables via CTAS.
CTAS Context: When using CREATE TABLE AS SELECT, the schema from the SELECT query needs to be applied to the output files. Parquet's inherent schema support makes this process more reliable and efficient compared to CSV.

Therefore, the correct answer is C - Parquet files have a well-defined schema, which is particularly beneficial when creating external tables via CTAS statements.

Powered ByGPT-5.2

Comments

Loading comments...