
Answer-first summary for fast verification
Answer: Transfer your data to Cloud Storage and execute your jobs on Dataproc.
The optimal solution is to **transfer your data to Cloud Storage and execute your jobs on Dataproc**. Cloud Storage offers scalable and durable object storage, making your data readily accessible for Spark jobs on Dataproc. Dataproc, Google Cloud's managed Spark and Hadoop service, facilitates running these jobs without the overhead of infrastructure management, aligning with the goal of minimal code changes and efficient migration. - **Option A** is less ideal because BigQuery, while excellent for data analysis, doesn't directly support Spark jobs, and converting scripts to SQL could necessitate extensive modifications. - **Option B** involves significant redevelopment into Apache Beam, which contradicts the aim of minimizing code alterations. - **Option C** requires manual infrastructure management on Compute Engine, diverging from the preference for managed services and adding unnecessary complexity within a constrained timeline.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are planning to migrate numerous Apache Spark jobs from an on-premises Apache Hadoop cluster to Google Cloud, aiming to use managed services for job execution to avoid managing a persistent Hadoop cluster. With a tight deadline and the objective to keep code changes to a minimum, what is the most efficient approach to achieve this migration?
A
Migrate your data to BigQuery and transform your Spark scripts into SQL-based processing.
B
Re-engineer your jobs using Apache Beam and execute them on Dataflow.
C
Transfer your data to Compute Engine disks and directly manage and execute your jobs on these instances.
D
Transfer your data to Cloud Storage and execute your jobs on Dataproc.