
Answer-first summary for fast verification
Answer: Configuring spark.hadoop.fs.azure integration properties correctly
The correct configuration to prioritize for minimizing read and write latency is **B. Configuring spark.hadoop.fs.azure integration properties correctly**. This ensures efficient interaction between Spark and Azure Data Lake Storage Gen2, optimizing data access and reducing latency. While options A, C, and D can impact performance, they do not directly address the latency issues with Azure Data Lake Storage Gen2. Option A focuses on shuffle operations, option C on increasing processing capacity, and option D on caching intermediate data, none of which are as directly related to minimizing latency with external storage as option B.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
To minimize read and write latency when optimizing a Spark job for processing a large dataset stored in Azure Data Lake Storage Gen2, which configuration should be prioritized?
A
Adjusting spark.sql.shuffle.partitions to match the number of cores
B
Configuring spark.hadoop.fs.azure integration properties correctly
C
Increasing the spark.executor.instances
D
Tuning spark.databricks.io.cache.enabled to true
No comments yet.