
Answer-first summary for fast verification
Answer: Provide adequate persistent disk space to the Hadoop cluster and store the intermediate data on native Hadoop Distributed File System (HDFS)
Allocating sufficient persistent disk space to the Hadoop cluster and storing the intermediate data on native HDFS can significantly improve performance for disk I/O intensive jobs. This approach reduces the reliance on the Cloud Storage connector, which may not be optimized for such workloads, thereby minimizing disk I/O delays. For more details, refer to Google Cloud's documentation on disk performance optimization and Hadoop migration strategies: [Optimize Disk Performance](https://cloud.google.com/compute/docs/disks/performance#optimize_disk_performance) and [Hadoop Migration Jobs](https://cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-jobs).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
What is a recommended approach to improve the performance of a disk I/O intensive Hadoop job that's running slowly on Cloud Dataproc, especially when using the Cloud Storage connector for data storage?
A
Enhance the virtual machine instances' CPU cores to boost networking bandwidth for each instance
B
Increase the Hadoop cluster's memory allocation to keep the intermediary data of the slow Hadoop job in memory
C
Assign additional network interface cards (NICs) and configure link aggregation in the OS to utilize combined throughput with Cloud Storage
D
Provide adequate persistent disk space to the Hadoop cluster and store the intermediate data on native Hadoop Distributed File System (HDFS)