
Answer-first summary for fast verification
Answer: Deploy a Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
Option A is the correct answer because it balances cost-effectiveness and fault tolerance. Using a standard persistent disk is more cost-effective than an SSD persistent disk. Additionally, by deploying a Dataproc cluster with 50% preemptible workers, you can significantly reduce costs while still maintaining redundancy. Storing data in Cloud Storage ensures scalability and durability, and changing references in scripts from hdfs:// to gs:// enables seamless integration with the cloud storage infrastructure. Options C and D are less appropriate as they involve manually managing Compute Engine instances, which is not a managed service approach and may increase operational complexity and costs.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are currently managing an on-premises Apache Hadoop deployment and are considering migrating it to the cloud. Your primary objectives are to achieve high fault tolerance and cost efficiency, particularly for long-running batch jobs. Furthermore, you prefer to utilize a managed service to simplify the migration and operational processes. What steps should you take to accomplish this?
A
Deploy a Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
B
Deploy a Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
C
Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://
D
Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://
No comments yet.