Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

What is a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

Parquet files can be partitioned

Parquet files will become Delta tables

Parquet files have a well-defined schema

Parquet files have the ability to be optimized

Explanation:

Explanation

When comparing Parquet and CSV formats for creating external tables with CREATE TABLE AS SELECT (CTAS) statements, the key benefit of Parquet is that it has a well-defined schema. Here's why:

Schema Definition: Parquet is a columnar storage format that stores schema metadata within the file itself. This means the schema (column names, data types, etc.) is embedded in the Parquet file.
CSV Limitations: CSV files are schema-less text files. When creating an external table from CSV, you must explicitly define the schema (column names and data types) in your CREATE TABLE statement, or Databricks will infer the schema which can lead to errors or incorrect data type assignments.
CTAS Context: In a CREATE TABLE AS SELECT statement, when you're creating an external table, Parquet's built-in schema ensures that the table structure is preserved and self-contained within the file format itself.
Other Options Analysis:
- A (Partitioning): Both Parquet and CSV can be partitioned in Databricks, so this is not a unique benefit of Parquet.
- B (Become Delta tables): Parquet files don't automatically become Delta tables; you need to explicitly create a Delta table or convert to Delta format.
- D (Optimization): While Parquet can be optimized, this is more relevant to Delta Lake features rather than a fundamental benefit of Parquet over CSV for external tables.

Key Takeaway: The primary advantage of using Parquet over CSV for external tables in CTAS statements is that Parquet files contain their own schema metadata, making table creation more reliable and less error-prone.

Powered ByGPT-5.2

Comments

Loading comments...