
Explanation:
This scenario involves loading JSON files from Azure Data Lake Storage Gen2 into an Azure Synapse Analytics Apache Spark pool where the files have varying structures and data types. The key requirement is to maintain the source data types during the loading process.
Option D: Load the data by using PySpark ✅ CORRECT
inferSchema option in PySpark's read.json() method can detect and preserve the original data types from the source JSON filesOption C: Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool ❌ INCORRECT
Option A: Use a Conditional Split transformation in an Azure Synapse data flow ❌ INCORRECT
Option B: Use a Get Metadata activity in Azure Data Factory ❌ INCORRECT
PySpark is the optimal choice because:
When working with Apache Spark pools in Azure Synapse Analytics, using PySpark for JSON data loading is the recommended approach as it provides the most robust handling of varying data structures while ensuring data type consistency throughout the data processing workflow.
Ultimate access to all questions.
You have an Azure Synapse Analytics Apache Spark pool named Pool1. You need to load JSON files from an Azure Data Lake Storage Gen2 container into tables in Pool1. The files have varying structures and data types. You must preserve the source data types during the load process.
What should you do?
A
Use a Conditional Split transformation in an Azure Synapse data flow.
B
Use a Get Metadata activity in Azure Data Factory.
C
Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
D
Load the data by using PySpark.
No comments yet.