Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Suppose you have a DataFrame df with columns user_id and subscription_type. How would you ensure that each user_id is associated with just one unique subscription_type? Provide the Spark code to validate this. (Choose Two)
df
user_id
subscription_type
A
df.groupBy('user_id').agg(countDistinct('subscription_type').alias('distinct_count')).filter('distinct_count > 1')
B
df.groupBy('user_id') .agg(countDistinct('subscription_type').alias('distinct_count')).filter('distinct_count != 1')
C
df.dropDuplicates(['user_id', 'subscription_type']).groupBy('user_id').count().filter('count > 1')
D
df.groupBy('user_id').agg(collect_set('subscription_type').alias('subs')).filter(size('subs') != 1)
E
df.groupBy('user_id').agg(count('subscription_type').alias('cnt')).filter('cnt > 1')