
Answer-first summary for fast verification
Answer: spark.sql.files.maxPartitionBytes
The correct answer is A. The configuration parameter `spark.sql.files.maxPartitionBytes` directly determines the maximum size (in bytes) of a single partition when reading data from files. This setting controls how Spark splits input files into partitions during ingestion. Options B, C, and D are unrelated to initial partition sizing: B governs broadcast joins, C provides advisory sizes for adaptive query execution (AQE) during shuffles, and D sets the minimum partitions during AQE coalescing. Only A directly impacts partition size at ingestion.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which configuration parameter directly controls the size of a Spark partition during data ingestion?
A
spark.sql.files.maxPartitionBytes
B
spark.sql.autoBroadcastJoinThreshold
C
spark.sql.adaptive.advisoryPartitionSizeInBytes
D
spark.sql.adaptive.coalescePartitions.minPartitionNum
No comments yet.