
Answer-first summary for fast verification
Answer: Choose a column with a low cardinality that evenly distributes data.
Choosing a column with low cardinality that evenly distributes data is correct because it ensures efficient storage and query performance through effective partition pruning. High cardinality columns can lead to uneven distribution and increased overhead. Frequent updates to a partition column can reduce the effectiveness of partition pruning. Complex data types complicate partitioning and query performance. Using an unrelated column for partitioning does not improve efficiency and may lead to confusion.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When optimizing data storage and query performance in Databricks by selecting a partition column, which of the following considerations is most important?
A
Select a column with high cardinality to reduce the number of partitions.
B
Opt for a column with frequent updates to improve partition pruning.
C
Choose a column with a low cardinality that evenly distributes data.
D
Prioritize a column with complex data types for better organization.
E
Use a column unrelated to the data to enhance partitioning efficiency.
No comments yet.