Microsoft Azure Data Engineer Associate - DP-203

Get started today

Ultimate access to all questions.

Explanation:

Explanation

In Azure Synapse Analytics, when creating tables in a Spark pool database that need to be automatically available as external tables in the serverless SQL pool, Parquet format is the optimal choice for several technical reasons:

Why Parquet is the Correct Answer:

Native Integration: Azure Synapse Analytics has built-in, optimized support for Parquet format across both Spark pools and serverless SQL pools. This ensures seamless metadata synchronization and table discovery.
Automatic Table Synchronization: When you create Spark tables using Parquet format in a Spark database, Synapse Analytics automatically registers these tables as external tables in the serverless SQL pool's metadata store. This enables immediate querying without manual external table creation.
Performance Benefits: Parquet is a columnar storage format that provides excellent compression and query performance. It's particularly well-suited for analytical workloads in both Spark and SQL environments.
Official Microsoft Support: Microsoft documentation explicitly lists Parquet as a supported format for automatic table synchronization between Spark pools and serverless SQL pools in Synapse Analytics.

Why Other Options Are Less Suitable:

ORC (B): While ORC is also a columnar format, it has less comprehensive integration and automatic synchronization support in Azure Synapse Analytics compared to Parquet.
JSON (C): JSON is a row-based format that lacks the performance characteristics of columnar formats for analytical queries. More importantly, JSON tables in Spark don't automatically synchronize as external tables in serverless SQL pools.
HIVE (D): Hive is not a file format but rather a data warehouse system. This option doesn't represent a valid file format choice for table storage in this context.

Best Practice Consideration:

Parquet should be the default choice for analytical workloads in Azure Synapse Analytics due to its performance characteristics, compression efficiency, and seamless integration across the Synapse ecosystem.

Explanation:

Explanation

Why Parquet is the Correct Answer:

Native Integration: Azure Synapse Analytics has built-in, optimized support for Parquet format across both Spark pools and serverless SQL pools. This ensures seamless metadata synchronization and table discovery.
Automatic Table Synchronization: When you create Spark tables using Parquet format in a Spark database, Synapse Analytics automatically registers these tables as external tables in the serverless SQL pool's metadata store. This enables immediate querying without manual external table creation.
Performance Benefits: Parquet is a columnar storage format that provides excellent compression and query performance. It's particularly well-suited for analytical workloads in both Spark and SQL environments.
Official Microsoft Support: Microsoft documentation explicitly lists Parquet as a supported format for automatic table synchronization between Spark pools and serverless SQL pools in Synapse Analytics.

Why Other Options Are Less Suitable:

ORC (B): While ORC is also a columnar format, it has less comprehensive integration and automatic synchronization support in Azure Synapse Analytics compared to Parquet.
JSON (C): JSON is a row-based format that lacks the performance characteristics of columnar formats for analytical queries. More importantly, JSON tables in Spark don't automatically synchronize as external tables in serverless SQL pools.
HIVE (D): Hive is not a file format but rather a data warehouse system. This option doesn't represent a valid file format choice for table storage in this context.

Best Practice Consideration:

Comments (0)

No comments yet.

You have an Azure Synapse Analytics workspace named WS1 containing an Apache Spark pool named Pool1. You plan to create a database named DB1 in Pool1. You need to ensure that tables created in DB1 are automatically available as external tables in the built-in serverless SQL pool. Which table format should you use for the tables in DB1?

Exam-Like

Last updated: July 15, 2026 at 14:06

Parquet

ORC

JSON

HIVE