
Answer-first summary for fast verification
Answer: The autoscaling policy is scaling down too quickly, leading to loss of shuffle data when a node is decommissioned.
The FetchFailedException typically occurs when shuffle data is lost during the decommissioning of a node due to autoscaling. It's crucial to configure the autoscaling policy based on the longest running job in the cluster to prevent such issues. Rapid addition of nodes does not result in data loss. Cloud Storage is recommended for persistent storage in Dataproc clusters. Network issues, though a potential cause for FetchFailedException, are unlikely when using Cloud Storage with Cloud Dataproc. Consistent errors would indicate improper access controls on the storage bucket, not intermittent ones. For more details, refer to the [Dataproc Best Practices Guide](https://cloud.google.com/blog/topics/developers-practitioners/dataproc-best-practices-guide).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A new workload has been deployed to Cloud Dataproc, configured with an autoscaling policy. Intermittently, a FetchFailedException is observed. What is the most probable cause of this issue?
A
The autoscaling policy is scaling down too quickly, leading to loss of shuffle data when a node is decommissioned.
B
The GCS bucket in use has incorrect access controls configured.
C
The autoscaling policy is adding nodes at a rapid pace, causing data to be dropped.
D
Google Cloud Storage is being utilized for persistent storage instead of local storage.
No comments yet.