
Answer-first summary for fast verification
Answer: Implement a custom partitioning logic that redistributes the skewed data across existing partitions.
Option B is the correct approach as it involves implementing a custom partitioning logic that can intelligently redistribute the skewed data across existing partitions, ensuring balanced processing. Option A might not be effective if the skew is due to the nature of the data rather than the number of partitions. Option C is not advisable as ignoring skew can lead to performance bottlenecks. Option D, while a valid technique, may not be the most efficient if the skew is extreme and requires more sophisticated rebalancing.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a data processing pipeline, you have identified a skew in the data distribution across different partitions. How would you handle this skew to ensure balanced processing and avoid hotspots in your distributed system?
A
Increase the number of partitions to distribute the data more evenly.
B
Implement a custom partitioning logic that redistributes the skewed data across existing partitions.
C
Ignore the skew and let the system handle it naturally.
D
Use a salting technique to add a random element to the data keys to distribute the load more evenly.
No comments yet.