
Answer-first summary for fast verification
Answer: storesDF.repartition(), storesDF.intersect(otherDF)
The question asks for operations that induce a shuffle and return a new DataFrame with updated partitions. - **A. coalesce()**: Reduces partitions without a shuffle (merges existing partitions), so it does not induce a shuffle. - **B. rdd.getNumPartitions()**: Retrieves the partition count but does not modify the DataFrame or trigger a shuffle. - **C. repartition()**: Always induces a shuffle to redistribute data and creates a new DataFrame with the specified number of partitions. - **D. union()**: Combines DataFrames by appending rows, which does not require a shuffle (partitions are concatenated). - **E. intersect()**: Requires a shuffle to compare rows across all partitions and find common elements, resulting in a new partitioned DataFrame. Thus, **C** and **E** are correct as they involve shuffling and produce new partitions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following operations will consistently produce a new DataFrame with repartitioned data from DataFrame storesDF by triggering a shuffle operation?
A
storesDF.coalesce()
B
storesDF.rdd.getNumPartitions()
C
storesDF.repartition()
D
storesDF.union()
E
storesDF.intersect(otherDF)
No comments yet.