
Answer-first summary for fast verification
Answer: Use Google Cloud Storage instead of Persistent Disk for the HDFS data.
## Explanation To minimize storage costs when migrating an on-premises Hadoop cluster to Google Cloud Dataproc, the most cost-effective approach is to use **Google Cloud Storage (GCS)** instead of Persistent Disk for HDFS data. ### Why Option A is Correct: 1. **Cost Efficiency**: Google Cloud Storage is significantly cheaper than Persistent Disk for large-scale data storage. GCS offers lower per-gigabyte pricing compared to block storage options. 2. **HDFS Compatibility**: Dataproc can use GCS as the underlying storage layer through the Google Cloud Storage connector, which provides HDFS-compatible access to data stored in GCS buckets. 3. **Managed Service Benefits**: GCS is a fully managed service that eliminates the need for storage management, scaling automatically, and providing high durability and availability. 4. **Separation of Compute and Storage**: By using GCS, you decouple storage from compute, allowing you to scale compute resources independently of storage, which can lead to additional cost savings. ### Why Other Options Are Less Optimal: - **Options B, D, and F (Persistent Disk variants)**: All Persistent Disk options are more expensive than GCS for large-scale data storage and require manual management of storage capacity. - **Option C (Local SSD)**: Local SSDs are ephemeral storage that doesn't persist when instances are stopped, making them unsuitable for persistent HDFS data. - **Option E (Cloud Storage FUSE)**: While this could work, it's less efficient than the native GCS connector and may have performance implications. ### Implementation Approach: 1. Create GCS buckets to store your Hadoop data 2. Configure Dataproc clusters to use GCS as the default filesystem 3. Migrate existing HDFS data to GCS 4. Update applications to use GCS paths instead of HDFS paths This approach provides the best balance of cost savings, performance, and operational simplicity for Hadoop workloads on Google Cloud.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
NO.36 Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?
A
Use Google Cloud Storage instead of Persistent Disk for the HDFS data.
B
Use SSD Persistent Disk for the HDFS data.
C
Use Local SSD for the HDFS data.
D
Use standard Persistent Disk for the HDFS data.
E
Use Cloud Storage FUSE for the HDFS data.
F
Use Persistent Disk SSD for the HDFS data.