
Ultimate access to all questions.
You're setting up a managed Hadoop system for your data lake, using Cloud Storage for input, output, and intermediary data to separate storage from compute. However, a specific Hadoop job runs much slower on Cloud Dataproc than on an on-premises bare-metal Hadoop setup with 8-core nodes and 100-GB RAM, suggesting it's disk I/O intensive. What's the best way to improve the performance of this Hadoop job on Cloud Dataproc?
A
Increase the memory allocation to the Hadoop cluster to keep the intermediary data of the job in memory.
B
Allocate enough persistent disk space to the Hadoop cluster and store the intermediate data on native HDFS.
C
Add more CPU cores to the virtual machine instances of the Hadoop cluster to increase networking bandwidth per instance.
D
Install an additional network interface card (NIC) and set up link aggregation in the OS to combine throughput for Cloud Storage operations.