
Answer-first summary for fast verification
Answer: df.groupBy('order_id').count().filter('count > 1')
The correct answer is A because it groups by `order_id` and counts the occurrences, then filters to show only those with more than one occurrence, effectively identifying non-unique `order_id`s.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You have a DataFrame df with columns order_id, customer_id, and product_id. How would you validate that order_id is unique across all rows using Spark? Provide the code snippet.
A
df.groupBy('order_id').count().filter('count > 1')
B
df.select('order_id').distinct()
C
df.groupBy('order_id').agg(count('order_id') > 1)
D
df.groupBy('order_id').count().filter('count = 1')
No comments yet.