Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
You are working with a Delta Lake table 'transactions' that contains duplicate rows. Write a PySpark code snippet to deduplicate the rows based on all columns and save the result back to the same table.
A
df = spark.read.format('delta').load('/path/to/transactions')
df = df.dropDuplicates()
df.write.format('delta').mode('overwrite').save('/path/to/transactions')
B
df = df.distinct()
C
df = df.dropDuplicates(['column1', 'column2'])
D
df = df.distinct(['column1', 'column2'])