Google Professional Data Engineer

Ultimate access to all questions.

You're setting up a managed Hadoop system for your data lake, using Cloud Storage for input, output, and intermediary data to separate storage from compute. However, a specific Hadoop job runs much slower on Cloud Dataproc than on an on-premises bare-metal Hadoop setup with 8-core nodes and 100-GB RAM, suggesting it's disk I/O intensive. What's the best way to improve the performance of this Hadoop job on Cloud Dataproc?

Real Exam

Increase the memory allocation to the Hadoop cluster to keep the intermediary data of the job in memory.

37.5%

Allocate enough persistent disk space to the Hadoop cluster and store the intermediate data on native HDFS.

Loading comments...

Install an additional network interface card (NIC) and set up link aggregation in the OS to combine throughput for Cloud Storage operations.

12.5%