
Answer-first summary for fast verification
Answer: Use a random partitioning key to balance data across partitions.
Using a random partitioning key can help to balance data across partitions and mitigate issues related to data skew. This approach ensures that data is distributed more evenly across the available resources, which can improve overall performance by reducing the load on individual partitions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working with a dataflow that is processing large volumes of data from multiple sources. The dataflow is experiencing performance issues due to data skew. Describe how you would identify data skew in the dataflow and what steps you would take to resolve it. Consider both data-level and configuration-level optimizations.
A
Use dynamic partitioning to distribute data processing tasks.
B
Increase the number of partitions to improve parallelism.
C
Use a random partitioning key to balance data across partitions.
D
Use a partitioning key based on commonly queried columns.
No comments yet.