
Answer-first summary for fast verification
Answer: Parquet
## Explanation In Azure Synapse Analytics, when creating tables in a Spark pool database that need to be automatically available as external tables in the serverless SQL pool, **Parquet format** is the optimal choice for several technical reasons: ### Why Parquet is the Correct Answer: 1. **Native Integration**: Azure Synapse Analytics has built-in, optimized support for Parquet format across both Spark pools and serverless SQL pools. This ensures seamless metadata synchronization and table discovery. 2. **Automatic Table Synchronization**: When you create Spark tables using Parquet format in a Spark database, Synapse Analytics automatically registers these tables as external tables in the serverless SQL pool's metadata store. This enables immediate querying without manual external table creation. 3. **Performance Benefits**: Parquet is a columnar storage format that provides excellent compression and query performance. It's particularly well-suited for analytical workloads in both Spark and SQL environments. 4. **Official Microsoft Support**: Microsoft documentation explicitly lists Parquet as a supported format for automatic table synchronization between Spark pools and serverless SQL pools in Synapse Analytics. ### Why Other Options Are Less Suitable: - **ORC (B)**: While ORC is also a columnar format, it has less comprehensive integration and automatic synchronization support in Azure Synapse Analytics compared to Parquet. - **JSON (C)**: JSON is a row-based format that lacks the performance characteristics of columnar formats for analytical queries. More importantly, JSON tables in Spark don't automatically synchronize as external tables in serverless SQL pools. - **HIVE (D)**: Hive is not a file format but rather a data warehouse system. This option doesn't represent a valid file format choice for table storage in this context. ### Best Practice Consideration: Parquet should be the default choice for analytical workloads in Azure Synapse Analytics due to its performance characteristics, compression efficiency, and seamless integration across the Synapse ecosystem.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You have an Azure Synapse Analytics workspace named WS1 containing an Apache Spark pool named Pool1. You plan to create a database named DB1 in Pool1. You need to ensure that tables created in DB1 are automatically available as external tables in the built-in serverless SQL pool. Which table format should you use for the tables in DB1?
A
Parquet
B
ORC
C
JSON
D
HIVE
No comments yet.