
Explanation:
The correct answer is A because it groups by order_id and counts the occurrences, then filters to show only those with more than one occurrence, effectively identifying non-unique order_ids.
Ultimate access to all questions.
You have a DataFrame df with columns order_id, customer_id, and product_id. How would you validate that order_id is unique across all rows using Spark? Provide the code snippet.
A
df.groupBy('order_id').count().filter('count > 1')
B
df.select('order_id').distinct()
C
df.groupBy('order_id').agg(count('order_id') > 1)
D
df.groupBy('order_id').count().filter('count = 1')
No comments yet.