
Answer-first summary for fast verification
Answer: Deploy a Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
Deploying a Dataproc cluster is the optimal choice for migrating Apache Hadoop to the cloud, offering a managed service that ensures fault tolerance and cost-effectiveness for long-running batch jobs. Using a standard persistent disk is cost-effective for data that doesn't require high-performance I/O, and incorporating 50% preemptible workers significantly reduces costs. Storing data in Cloud Storage provides scalable and durable storage, and updating script references from hdfs:// to gs:// facilitates seamless integration with Cloud Storage. Other options are less ideal for various reasons: Using an SSD persistent disk (Option B) may be unnecessarily expensive for batch jobs not needing high-performance I/O. Managing your own infrastructure with Compute Engine instances (Options A and D) lacks the fault tolerance, cost-effectiveness, and scalability of a managed service like Dataproc, and relying on preemptible instances (Option D) introduces potential interruptions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are planning to migrate your on-premises Apache Hadoop deployment to the cloud to ensure fault-tolerance and cost-effectiveness for long-running batch jobs using a managed service. Which of the following steps should you take to achieve this goal?
A
Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://
B
Deploy a Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
C
Deploy a Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
D
Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://