
Answer-first summary for fast verification
Answer: Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS
The correct answer is B. Allocating sufficient persistent disk space to the Hadoop cluster and storing the intermediate data of that particular Hadoop job on native HDFS is preferred. This is because local HDFS storage is better suited for disk I/O-intensive operations due to its reduced latency and better performance for metadata operations, frequent modifications, append operations, and partitioned writes. Cloud Storage is more suited for initial and final data storage in a big-data pipeline, whereas intermediate results should be stored in a more performance-efficient local storage like HDFS.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with building a managed Hadoop system to serve as your data lake, where the data transformation process consists of a series of Hadoop jobs executed in sequence. To achieve the architecture of separating storage from compute, you have opted to utilize the Cloud Storage connector for handling all input data, output data, and intermediary data. However, it has come to your attention that one specific Hadoop job executes very slowly on Cloud Dataproc when compared to an on-premises bare-metal Hadoop environment, which is equipped with 8-core nodes and 100-GB RAM. An analysis indicates that this particular Hadoop job is highly disk I/O intensive. Your goal is to resolve this performance issue. What steps should you take?
A
Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory
B
Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS
C
Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up
D
Allocate additional network interface card (NIC), and configure link aggregation in the operating system to use the combined throughput when working with Cloud Storage
No comments yet.