
Explanation:
The correct configuration to prioritize for minimizing read and write latency is B. Configuring spark.hadoop.fs.azure integration properties correctly. This ensures efficient interaction between Spark and Azure Data Lake Storage Gen2, optimizing data access and reducing latency. While options A, C, and D can impact performance, they do not directly address the latency issues with Azure Data Lake Storage Gen2. Option A focuses on shuffle operations, option C on increasing processing capacity, and option D on caching intermediate data, none of which are as directly related to minimizing latency with external storage as option B.
Ultimate access to all questions.
To minimize read and write latency when optimizing a Spark job for processing a large dataset stored in Azure Data Lake Storage Gen2, which configuration should be prioritized?
A
Adjusting spark.sql.shuffle.partitions to match the number of cores
B
Configuring spark.hadoop.fs.azure integration properties correctly
C
Increasing the spark.executor.instances
D
Tuning spark.databricks.io.cache.enabled to true
No comments yet.