
Answer-first summary for fast verification
Answer: spark.cosmosdb.read.pageSize
The configuration option `spark.cosmosdb.read.pageSize` plays a crucial role in optimizing performance when querying data from Azure Cosmos DB. This is because it determines the number of items read in a single request from Cosmos DB, directly influencing the amount of data retrieved per query. Given Cosmos DB's global distribution and data partitioning across multiple regions for high availability and low latency, adjusting the page size can help mitigate performance impacts caused by data distribution. A larger page size may enhance efficiency for data spread across multiple partitions, whereas a smaller page size could be more effective for data within a single partition. Thus, `spark.cosmosdb.read.pageSize` is key to optimizing query performance by managing data retrieval in alignment with Cosmos DB's physical data distribution.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When a Spark application is querying data from Azure Cosmos DB, which configuration option has a significant impact on performance due to the physical distribution of data within Cosmos DB?
A
spark.sql.crossJoin.enabled
B
spark.executor.memoryOverhead
C
spark.cosmosdb.read.pageSize
D
spark.driver.allowMultipleContexts
No comments yet.