Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


Your company is currently facing an issue where its on-premises Apache Hadoop servers are nearing their end-of-life. As part of the IT strategy, there is a decision to migrate these servers to Google Cloud Dataproc. For a like-for-like migration of the existing cluster, each node would require 50 TB of Google Persistent Disk storage. The Chief Information Officer (CIO) is worried about the significant costs that would be incurred by using such a large amount of block storage. Your objective is to determine how to minimize the storage costs associated with this migration. What steps should you take?




Explanation:

The correct answer is A: 'Put the data into Google Cloud Storage.' This is because Google Cloud Dataproc is designed to work well with Google Cloud Storage (GCS) and using GCS instead of Persistent Disk can save money. GCS offers advantages such as higher durability and the ability to share data across multiple clusters. Additionally, this approach directly addresses the CIO’s concern about the cost of using block storage.