
Answer-first summary for fast verification
Answer: Utilizing Spark‘s dynamic partition pruning feature by enabling it in Spark‘s configuration settings.
Dynamic partition pruning is a Spark feature that optimizes query execution by automatically excluding unnecessary partitions based on query predicates. Enabling this feature in Spark's configuration settings allows for automatic optimization without manual intervention. While manually specifying partition columns (Option A) or using hints (Option B) can filter data, they don't leverage Spark's built-in optimization. Broadcast variables (Option D) reduce data shuffling but don't directly enable dynamic partition pruning. Thus, enabling Spark's dynamic partition pruning (Option C) is the most efficient method.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When querying a large table partitioned by date in Spark, which method allows Spark to dynamically prune partitions to optimize the query execution plan?
A
Manually specifying the partition columns in the WHERE clause of the query.
B
Annotating the query with hints to explicitly specify which partitions to include.
C
Utilizing Spark‘s dynamic partition pruning feature by enabling it in Spark‘s configuration settings.
D
Leveraging broadcast variables to share partition metadata across nodes.
No comments yet.