Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Given a DataFrame df with columns id, name, and timestamp, how would you create a new DataFrame that removes duplicate rows based on the id column? Provide the Spark code to achieve this.
df
id
name
timestamp
A
df.dropDuplicates(['id', 'name'])
B
df.distinct()
C
df.dropDuplicates(['id'])
D
df.drop('id')