
Explanation:
The correct answer is A. The configuration parameter spark.sql.files.maxPartitionBytes directly determines the maximum size (in bytes) of a single partition when reading data from files. This setting controls how Spark splits input files into partitions during ingestion. Options B, C, and D are unrelated to initial partition sizing: B governs broadcast joins, C provides advisory sizes for adaptive query execution (AQE) during shuffles, and D sets the minimum partitions during AQE coalescing. Only A directly impacts partition size at ingestion.
Ultimate access to all questions.
No comments yet.
Which configuration parameter directly controls the size of a Spark partition during data ingestion?
A
spark.sql.files.maxPartitionBytes
B
spark.sql.autoBroadcastJoinThreshold
C
spark.sql.adaptive.advisoryPartitionSizeInBytes
D
spark.sql.adaptive.coalescePartitions.minPartitionNum