
Answer-first summary for fast verification
Answer: Allocate enough persistent disk space to the Hadoop cluster and store the intermediate data on native HDFS.
The correct answer is **B**. Allocating sufficient persistent disk space and using native HDFS for intermediate data storage directly addresses the disk I/O intensity of the job. Native HDFS is optimized for Hadoop jobs, making it more efficient for disk I/O operations. - **Option A** might not be the best solution because, despite increasing memory, disk I/O intensive jobs may still face performance issues if the data cannot be efficiently managed in memory. - **Option C** is less effective as adding CPU cores may improve general performance but doesn't specifically target disk I/O bottlenecks. - **Option D** focuses on improving network throughput, which doesn't directly solve the disk I/O performance issue of the Hadoop job.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You're setting up a managed Hadoop system for your data lake, using Cloud Storage for input, output, and intermediary data to separate storage from compute. However, a specific Hadoop job runs much slower on Cloud Dataproc than on an on-premises bare-metal Hadoop setup with 8-core nodes and 100-GB RAM, suggesting it's disk I/O intensive. What's the best way to improve the performance of this Hadoop job on Cloud Dataproc?
A
Increase the memory allocation to the Hadoop cluster to keep the intermediary data of the job in memory.
B
Allocate enough persistent disk space to the Hadoop cluster and store the intermediate data on native HDFS.
C
Add more CPU cores to the virtual machine instances of the Hadoop cluster to increase networking bandwidth per instance.
D
Install an additional network interface card (NIC) and set up link aggregation in the OS to combine throughput for Cloud Storage operations.
No comments yet.