
Answer-first summary for fast verification
Answer: Dataproc: A fully managed service for running Apache Spark and Hadoop clusters, offering dynamic scaling, cost controls, and seamless integration with Google Cloud services.
**Correct Option: C. Dataproc** Dataproc is the best choice for this scenario because it is Google Cloud's fully managed service specifically designed for Apache Spark and Hadoop, making it ideal for large-scale data processing jobs. It offers dynamic scaling to handle varying workloads, cost controls to manage expenses effectively, and seamless integration with other Google Cloud services like BigQuery and Cloud Storage for analytics and storage. These features align perfectly with the requirements of processing large-scale transaction data in real-time and batch modes while minimizing operational overhead. **Why other options are not correct**: - **A. Cloud Composer**: While useful for workflow orchestration, it is not designed for direct data processing with Apache Spark. - **B. BigQuery**: Although powerful for analytics and querying, it does not support executing Apache Spark jobs directly. - **D. Cloud Dataflow**: While it supports both stream and batch processing, it is based on Apache Beam and is less optimal for large-scale batch processing compared to Dataproc.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with designing a data processing solution for a financial services company that requires processing large-scale transaction data in real-time and batch modes using Apache Spark. The solution must be cost-effective, scalable, and fully managed to minimize operational overhead. Additionally, it should seamlessly integrate with other Google Cloud services for analytics and storage. Which Google Cloud service is the BEST choice for this scenario, and why? Choose the most appropriate option.
A
Cloud Composer: A workflow orchestration service that manages workflows across Google Cloud and hybrid environments, but not optimized for direct data processing with Apache Spark.
B
BigQuery: A serverless, highly scalable data warehouse that excels in running fast SQL queries but is not designed for executing Apache Spark jobs.
C
Dataproc: A fully managed service for running Apache Spark and Hadoop clusters, offering dynamic scaling, cost controls, and seamless integration with Google Cloud services.
D
Cloud Dataflow: A unified stream and batch data processing service based on Apache Beam, which is less optimal for large-scale batch processing compared to Dataproc.