
Databricks Certified Associate Developer for Apache Spark
Get started today
Ultimate access to all questions.
Which of the following operations can be used to create a new DataFrame from DataFrame storesDF
without causing a shuffle?
Which of the following operations can be used to create a new DataFrame from DataFrame storesDF
without causing a shuffle?
Exam-Like
Explanation:
The correct operations are union() and coalesce(1).
- C. union(): Combines two DataFrames by appending rows without shuffling data. It is a narrow transformation as it simply stacks partitions.
- D. coalesce(1): Reduces partitions by merging existing ones without a full shuffle. Coalesce avoids shuffling by combining partitions locally where possible, even though moving to a single partition may involve data movement, it does not induce a shuffle (unlike repartition).
Other options:
- A. intersect(): Requires identifying common rows, which typically triggers a shuffle.
- B. repartition(1): Explicitly redistributes data via a shuffle.
- E. rdd.getNumPartitions(): Returns an integer (number of partitions), not a DataFrame.