
Answer-first summary for fast verification
Answer: Deploy a Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
Dataproc is a managed Spark and Hadoop service designed to leverage open-source data tools for various processing needs, including batch processing, querying, streaming, and machine learning. It simplifies cluster management, reduces administrative overhead, and optimizes costs by allowing clusters to be turned off when not in use. This enables users to concentrate more on their data and jobs rather than on cluster management.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which managed service offers the most fault-tolerant and cost-effective solution for migrating an on-premises Apache Hadoop deployment to the cloud for long-running batch jobs?
A
Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://
B
Deploy a Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
C
Deploy a Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
D
Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://
No comments yet.