
Answer-first summary for fast verification
Answer: Parquet
## Analysis of Table Format Requirements for Automatic External Table Availability ### Question Context The requirement is to ensure that tables created in a Spark database (DB1) within an Apache Spark pool (Pool1) are **automatically available** as external tables in the built-in serverless SQL pool without manual intervention. ### Supported Formats for Automatic Metadata Synchronization **Parquet (Option D) is the optimal choice** for the following reasons: - **Native Metadata Synchronization**: Azure Synapse Analytics provides automatic metadata synchronization between Spark pools and serverless SQL pools specifically for Parquet and CSV formats. When you create external tables in Spark using these formats, the metadata is automatically registered in the serverless SQL pool's metastore. - **Columnar Storage Advantages**: Parquet offers significant performance benefits for analytical workloads: - Columnar storage enables efficient compression and encoding - Better query performance through column pruning and predicate pushdown - Reduced I/O operations for analytical queries that typically scan specific columns - Superior performance compared to row-based formats like CSV - **Built-in Integration**: The serverless SQL pool can directly query Parquet files in Azure Storage without requiring the Spark pool to be running, enabling cost-effective analytics. ### Alternative Format Analysis **CSV (Option A)**: While technically supported for automatic metadata synchronization, CSV has significant limitations: - Row-based storage is inefficient for analytical queries - No built-in schema enforcement or data type validation - Poor compression compared to columnar formats - Slower query performance due to full row scans **ORC (Option B)**: Not automatically synchronized between Spark and serverless SQL pools in Azure Synapse Analytics. Requires manual external table creation. **JSON (Option C)**: While serverless SQL pools can query JSON files, they are not automatically synchronized as external tables from Spark databases. JSON also suffers from performance limitations for analytical workloads. ### Best Practice Recommendation For analytical workloads in Azure Synapse Analytics, Parquet is the recommended format due to its performance characteristics, compression efficiency, and seamless integration between Spark and serverless SQL pools. The automatic metadata synchronization ensures operational efficiency while the columnar format delivers optimal query performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You have an Azure Synapse Analytics workspace named WS1 containing an Apache Spark pool named Pool1. You intend to create a database named DB1 in Pool1. You need to guarantee that any tables created in DB1 are automatically available as external tables in the built-in serverless SQL pool. Which table format should you use for the tables in DB1?
A
CSV
B
ORC
C
JSON
D
Parquet
No comments yet.