
Explanation:
The requirement is to ensure that tables created in a Spark database (DB1) within an Apache Spark pool (Pool1) are automatically available as external tables in the built-in serverless SQL pool without manual intervention.
Parquet (Option D) is the optimal choice for the following reasons:
Native Metadata Synchronization: Azure Synapse Analytics provides automatic metadata synchronization between Spark pools and serverless SQL pools specifically for Parquet and CSV formats. When you create external tables in Spark using these formats, the metadata is automatically registered in the serverless SQL pool's metastore.
Columnar Storage Advantages: Parquet offers significant performance benefits for analytical workloads:
Built-in Integration: The serverless SQL pool can directly query Parquet files in Azure Storage without requiring the Spark pool to be running, enabling cost-effective analytics.
CSV (Option A): While technically supported for automatic metadata synchronization, CSV has significant limitations:
ORC (Option B): Not automatically synchronized between Spark and serverless SQL pools in Azure Synapse Analytics. Requires manual external table creation.
JSON (Option C): While serverless SQL pools can query JSON files, they are not automatically synchronized as external tables from Spark databases. JSON also suffers from performance limitations for analytical workloads.
For analytical workloads in Azure Synapse Analytics, Parquet is the recommended format due to its performance characteristics, compression efficiency, and seamless integration between Spark and serverless SQL pools. The automatic metadata synchronization ensures operational efficiency while the columnar format delivers optimal query performance.
Ultimate access to all questions.
No comments yet.
You have an Azure Synapse Analytics workspace named WS1 containing an Apache Spark pool named Pool1. You intend to create a database named DB1 in Pool1. You need to guarantee that any tables created in DB1 are automatically available as external tables in the built-in serverless SQL pool. Which table format should you use for the tables in DB1?
A
CSV
B
ORC
C
JSON
D
Parquet