Google Professional Data Engineer

Get started today

Ultimate access to all questions.

Explanation:

Deploying a Dataproc cluster is the optimal choice for migrating Apache Hadoop to the cloud, offering a managed service that ensures fault tolerance and cost-effectiveness for long-running batch jobs. Using a standard persistent disk is cost-effective for data that doesn't require high-performance I/O, and incorporating 50% preemptible workers significantly reduces costs. Storing data in Cloud Storage provides scalable and durable storage, and updating script references from hdfs:// to gs:// facilitates seamless integration with Cloud Storage.

Other options are less ideal for various reasons: Using an SSD persistent disk (Option B) may be unnecessarily expensive for batch jobs not needing high-performance I/O. Managing your own infrastructure with Compute Engine instances (Options A and D) lacks the fault tolerance, cost-effectiveness, and scalability of a managed service like Dataproc, and relying on preemptible instances (Option D) introduces potential interruptions.

Explanation:

Comments (0)

No comments yet.

You are planning to migrate your on-premises Apache Hadoop deployment to the cloud to ensure fault-tolerance and cost-effectiveness for long-running batch jobs using a managed service. Which of the following steps should you take to achieve this goal?

Real Exam

Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://

8.7%

Deploy a Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

4.3%

Deploy a Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

82.6%

Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://