
Explanation:
The correct answer is B. Allocating sufficient persistent disk space and using native HDFS for intermediate data storage directly addresses the disk I/O intensity of the job. Native HDFS is optimized for Hadoop jobs, making it more efficient for disk I/O operations.
Ultimate access to all questions.
No comments yet.
You're setting up a managed Hadoop system for your data lake, using Cloud Storage for input, output, and intermediary data to separate storage from compute. However, a specific Hadoop job runs much slower on Cloud Dataproc than on an on-premises bare-metal Hadoop setup with 8-core nodes and 100-GB RAM, suggesting it's disk I/O intensive. What's the best way to improve the performance of this Hadoop job on Cloud Dataproc?
A
Increase the memory allocation to the Hadoop cluster to keep the intermediary data of the job in memory.
B
Allocate enough persistent disk space to the Hadoop cluster and store the intermediate data on native HDFS.
C
Add more CPU cores to the virtual machine instances of the Hadoop cluster to increase networking bandwidth per instance.
D
Install an additional network interface card (NIC) and set up link aggregation in the OS to combine throughput for Cloud Storage operations.