Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
How can you implement a custom partitioner in a Spark job to ensure even distribution of data across partitions when processing a highly skewed dataset?
A
Apply the repartition method with a column that evenly distributes the data, avoiding custom partitioners.
B
Utilize partitionBy with a lambda function that identifies and distributes skewed keys evenly.
C
By overriding the default hash partitioner with a custom partitioner that assigns more partitions to skewed keys.
D
Create a UDF that tags data rows with a partition number, then use repartitionAndSortWithinPartitions.